CN120220944A

CN120220944A - Medical record mixed data processing method, device, electronic device and storage medium

Info

Publication number: CN120220944A
Application number: CN202510674080.7A
Authority: CN
Inventors: 曹力勇; 任彩红; 胡可云; 陈联忠
Original assignee: Beijing Jiahesen Health Technology Co ltd
Current assignee: Beijing Jiahesen Health Technology Co ltd
Priority date: 2025-05-23
Filing date: 2025-05-23
Publication date: 2025-06-27

Abstract

The present invention provides a method, device, electronic device and storage medium for processing medical case record mixed data, which are applicable to the field of medical information technology. The processing method uses a first data set constructed by a mixed inverted index based on data of a first medical record model that has undergone data governance as a query basis. During the query process, the query conditions in the query request are processed based on the first data set to generate a unified DSL statement to execute the query. The method can adapt to and take into account different data structures and storage to achieve the purpose of improving the query work efficiency under any unstructured medical case record mixed data structure.

Description

Medical record mixed data processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of medical information technologies, and in particular, to a method and apparatus for processing medical record mixed data, an electronic device, and a storage medium.

Background

Currently, during daily medical work, large amounts of data, such as patient complaints and current medical history, are collected or generated, which are often present in unstructured form, making them difficult to store with traditional relational databases.

Currently, the above types of data are stored in a non-relational database. However, since the data of different hospital centers can be managed according to the medical record models and the data management tools of the respective versions, the data structures after management are different, so that codes are independently written and maintained for storing single data aiming at the non-relational database, the development and maintenance are complicated, and the query work efficiency is low because the data structures are adapted to different data stores and structures when the query is applied.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for processing medical record mixed data, so as to solve the problem that the working efficiency is low when the query is related because different data structures and storages cannot be considered when the medical data is processed.

In order to solve the above problems, the embodiment of the present invention provides the following technical solutions:

The first aspect of the invention discloses a processing method of medical record mixed data, which comprises the following steps:

acquiring a query request, and determining a second data set from a first data set constructed in advance based on the query request, wherein the query request comprises query conditions and query parameters constructed based on first medical record models, the first medical record models are constructed by first medical record data of different types under various medical topics, and each first medical record model corresponds to one type of first medical record data;

Establishing a mapping relation between an index name and the second data set based on each index in the second data set, wherein the mapping relation is used for indicating a storage position and a data format of second medical record data in the second data set corresponding to the index;

converting the query condition in the query request into DSL statements, wherein the DSL statements at least comprise a range query statement and a matching query statement;

inquiring in the index based on the DSL statement, determining medical record data ids, and grouping the medical record data ids according to the corresponding index to obtain data groups;

For each data packet, converting an index id corresponding to the medical record data id into a medical record storage id by using the mapping relation;

Acquiring third registration content pre-stored in a data registration center according to the index id, and constructing a final query condition based on the third registration content and the medical record storage id;

and inquiring a database storing the third medical record data based on the final inquiring condition, and merging the inquired third medical record data to obtain a final medical record.

Preferably, the pre-constructing the first data set includes:

Acquiring first medical record data of different types under each medical topic, and constructing a corresponding first medical record model aiming at the first medical record data of each type under each medical topic;

managing the first medical record data in the first medical record model by using first data management tools of different versions to generate first data storage metadata of a mixed data structure of corresponding versions;

Registering the first data storage metadata in the data registration center to generate first registration content, wherein the data registration center at least comprises a first level and a second level, the first level comprises the first data management tool, and the second level comprises the first data storage metadata;

And constructing an inverted index by utilizing the first medical record model and the first registration content, and generating different first data sets divided according to data sources based on the inverted index, wherein the data sources comprise different data centers of the same hospital and different first data sets established according to different data centers when the same data management tool version and the first medical record model are used, or different first data sets established according to different hospitals of one area when different hospitals use the same data management tool version and the first medical record model.

Preferably, the method further comprises:

If a plurality of second data sets exist, creating the same index alias for indexes under the plurality of second data sets;

and establishing a mapping relation between the index aliases and a plurality of second data sets.

Preferably, after determining that there are a plurality of second data sets, further comprising:

and intercepting query conditions in the query request, which destroy the data structure of the second medical record data in the plurality of second data sets.

Preferably, converting the index id corresponding to the medical record data id into the medical record storage id by using the mapping relation includes:

acquiring an index id corresponding to the medical record data id;

determining a third data set corresponding to the index id based on the mapping relation;

Acquiring third registration content of third medical record data in the third data set in a data registration center, and determining a corresponding id generation rule according to the third registration content;

And converting the index id into a medical record storage id according to the id generation rule.

Preferably, the obtaining third registration content pre-stored in the data registration center according to the index id, and constructing a final query condition based on the third registration content and the medical record storage id includes:

Determining a corresponding third data set based on the index id and the mapping relation;

determining corresponding third data storage metadata based on a third data governance tool and a third medical record model in the third dataset;

Determining, in the data registry, third registration content based on the third data storage metadata;

identifying a storage location and a data format of third medical record data satisfying the query condition based on the third registration content;

and splicing the storage position, the data format and the medical record storage id of the third medical record data to obtain the final query condition.

Preferably, querying a database storing third medical record data based on the final query condition, merging the queried third medical record data to obtain a final medical record, including:

Inquiring a database storing third medical record data based on the medical record data inquiring field to obtain corresponding third medical record data, wherein the first medical record data comprises the third medical record data;

Converting the data format of the third medical record data to obtain third medical record data conforming to a preset standard data format;

and merging the third medical record data which accords with the preset standard data format to obtain a final medical record.

The second aspect of the embodiment of the invention discloses a processing device for medical record mixed data, which comprises:

The first acquisition unit is used for acquiring a query request and determining a second data set from a first data set constructed in advance based on the query request, wherein the query request comprises query parameters constructed based on first medical record models, the first medical record models are constructed by different types of first medical record data under various medical topics, and each medical record model corresponds to one type of first medical record data;

the first establishing unit is used for establishing a mapping relation between an index name and the second data set based on each index in the second data set, wherein the mapping relation is used for indicating the storage position and the data format of second medical record data in the second data set corresponding to the index;

The first conversion unit is used for converting the query conditions in the query request into DSL sentences, and the DSL sentences at least comprise range query sentences and matching query sentences;

A first query unit, configured to query in the index based on the DSL statement, determine a medical record data id, and group the medical record data id according to a corresponding index to obtain a data group;

the second conversion unit is used for converting index ids corresponding to the medical record data ids into medical record storage ids by utilizing the mapping relation for each data packet;

a second obtaining unit, configured to obtain third registration content pre-stored in a data registration center according to the index id, and construct a final query condition based on the third registration content and the medical record storage id;

And the second query unit is used for querying a database storing the first medical record data based on the final query condition, merging the queried third medical record data and obtaining the final medical record.

A third aspect of the embodiment of the present invention discloses a computer storage medium, on which a program is stored, which when executed by a processor, implements a method for processing medical record mixed data as disclosed in the first aspect of the embodiment of the present invention.

The fourth aspect of the embodiment of the invention discloses an electronic device, which comprises a memory, a processor and a program stored on the memory, wherein the processor executes the program to realize the processing method of medical record mixed data as disclosed in the first aspect of the embodiment of the invention.

According to the processing method, the device, the electronic equipment and the storage medium for medical record mixed data provided by the embodiment of the invention, a first medical record model, a first data set and a data registration center are established in advance, the data registration center is registered with a mapping relation between first data storage metadata obtained after first medical record data are processed through data and the first medical record model, the first data set is generated based on the first medical record model, the first data storage metadata and an inverted index constructed by first registration content in the data registration center, a second data set which initially accords with a query request is determined in the first data set in terms of a query disease duration, a mapping relation between an index name and the second data set is established based on each index in the second data set, a query condition in the query request is converted into DSL statement, a medical record data id is determined based on DSL statement and the mapping relation, the index id corresponding to the medical record data id is converted into a storage id, third registration content which is pre-stored in the data registration center is acquired according to the storage id, the first data in the third registration content is queried based on the third data registration content, and the medical record is finally queried to obtain the medical record. In the embodiment of the invention, aiming at different first medical record models, data obtained after data treatment is carried out by using different versions of first data treatment tools are mixed and inverted to form a first data set, the first data set is used as a query basis, in the query process, query conditions in a query request are processed based on the first data set to generate unified DSL statement execution query, and different data structures and storages can be adapted and considered, so that the aim of improving the working efficiency of query under any unstructured medical record mixed data structure is fulfilled.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a processing architecture of medical record mixed data according to an embodiment of the present invention;

FIG. 2 is an exemplary diagram of a build data set disclosed in an embodiment of the present invention;

FIG. 3 is a flow chart of a method for processing medical record mixed data according to an embodiment of the present invention;

Fig. 4 is a schematic structural diagram of a processing device for medical record mixed data according to an embodiment of the present invention;

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action or layer from another entity or action or layer without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

As known in the background art, the unstructured medical record data needs to be encoded and maintained separately for each storage mode, and when the unstructured medical record data is applied to specific query, different data storage and data structures need to be adapted, so that query efficiency is low.

Therefore, the invention discloses a processing scheme for medical record mixed data, which is mainly applicable to a medical record inquiring application scene. The first data set is used as a query basis by carrying out mixed inverted index on data obtained after data treatment by using first data treatment tools of different versions according to different first medical record models, and in the query process, query conditions in a query request are processed based on the first data set to generate unified DSL statement execution query, so that different data structures and storages can be adapted and considered, and the aim of improving the working efficiency of query under any unstructured medical record mixed data structure is fulfilled. The specific implementation process is as follows:

Fig. 1 is a schematic diagram of a processing architecture of medical record mixed data according to an embodiment of the present invention, which mainly includes a record module 10, a data registration center 11, a data set module 12 and a query module 13.

The medical record module 10 mainly comprises a first medical record model constructed based on first medical record data of different types under various medical subjects.

Different medical topics exist in the medical field, each medical topic contains different types of first medical record data, and based on the different types of first medical record data, a corresponding first medical record model is built for each type of first medical record data under each medical topic.

For example, in a specific application, there are medical topics such as clinical topics, internet of things topics, sanitary resource topics, bioinformatic topics, etc. in the medical field. Each medical subject has a different type of data. Therefore, in the invention, based on the first medical record data of different types under each medical theme, a corresponding first medical record model is built and used as the basis of subsequent data management.

It should be noted that, the first medical record model constructed for different data may be named by using its data features, but is essentially the first medical record model.

The following is specifically illustrated by constructing a first medical record model corresponding to different types of first medical record data under different medical subjects.

Example one, under clinical topics, a clinical scientific model is constructed based on characteristics of clinical medical record data. The clinical scientific model comprises a medical records, a hospital admission record, a test examination report and other documents, as shown in table 1.

Table 1:

example two, under clinical theme, a patient portrayal model is constructed based on the data characteristics of the patient. The patient representation model contains information such as personal base information, disease information, family history, health prediction and the like, and the information can exist in the form of a document, and is specifically shown in table 2.

Table 2:

And thirdly, under the internet of things theme, constructing a monitoring model based on the patient electrocardiographic monitoring data recorded by the monitoring room. The monitoring model contains information such as physical signs, electrocardiograms and the like. This information may exist in the form of a document, as shown in table 3.

Table 3:

The data registry 11 is pre-built with at least two tiers.

The data registration center 11 is mainly used for data management and registration based on the first medical record model constructed in the medical record module 10.

In particular, the first data governance tool may be continually iteratively improved such that iterative versions of different version numbers occur, and it is because of the constantly iterative improvement that mixed data structures may only occur.

In the data registry 11, the version numbers of the different versions of the first data governance tool serve as the first hierarchy of the data registry 11. For example, a first data governance tool with version number v1.0, denoted adcutil1.0. The first data governance tool with version number v2.0 after iterative improvement is denoted adcutil 2.0.0.

In the data registry 11, after the data is managed by the first data managing tool of the different versions of the first hierarchy, the first data storage metadata of the hybrid data structure is generated as the second hierarchy of the data registry 11. The first data storage metadata corresponds to a version of a first data governance tool used in governance. For example, a first medical record model is currently subjected to data management by using a first data management tool with a version number of adcutil to 1.0, the obtained first data storage metadata corresponds to the adcutil1.0 version, after the first data management tool is iterated, the first data management tool with a version number of adcutil 2.0.0 is obtained, the first medical record model is also subjected to data management, and the obtained first data storage metadata corresponds to the adcutil2.0 version.

After the first data storage metadata of the hybrid data structure is generated in the data registration center 11, a mapping relation between the first data storage metadata and the first medical record model corresponding to the first data storage metadata is established, registration is performed in the data registration center 11, corresponding first registration content is obtained, and a field path corresponding to the first medical record model is searched based on the mapping relation registered in the data registration center 11 in the follow-up query.

It should be noted that the registration content includes, but is not limited to, a field of the first medical record model, a database name, a table, a data format, and a concatenation rule of an id for storing the first data storage metadata, and the like. The database and the table can be automatically identified and queried based on the first registration content during the subsequent query, and the data of fields required by the query can be automatically interpreted and identified.

For example, after data in the clinical scientific model is subjected to data management by the data management tool v1.0, first data storage metadata of the mixed data structure is generated, which is specifically expressed as CLINICALRESEARCH 1.0.0. The registered path after registration is adcutil 1.0.0/CLINICALRESEARCH 1.0.

After data is processed by the data processing tool v1.0, the data in the patient portrait model generates first data storage metadata of a mixed data structure, which is specifically shown as userportrait 1.0.0. The registered path after registration is adcutil 1.0.0/userportrait 1.0.

After data management is carried out on the data in the monitoring model through a data management tool v1.0, first data storage metadata of a mixed data structure is generated, and the first data storage metadata is specifically expressed as wardship 1.0.0. The registered path after registration is adcutil 1.0.0/wardship 1.0.

In the following, the first medical record model and the first data management tool are used to manage data, and then the first medical record model and the first data management tool are registered in the data registration center 11 to generate first registration content.

And fourth, taking a clinical scientific research model as an example.

And carrying out data management on the data in the clinical scientific research model by using adcutil 1.0.0. Based on the clinical scientific research model, the storage structure is two libraries, one is a historical clinical scientific research medical record library (lishi) and the other is an in-hospital scientific research medical record library (zaiyuan). In each library, a respective clinical raw data table and a post-administration data table are established based on clinical business. The table at least contains the original case top page (binganshouye _src), the treated case top page (binganshouye), the original inspection report (jianyanbaogao _src), the treated inspection report (jianyanbaogao), the original admission record (ruyuanjilu _src), the treated admission record (ruyuanjilu) and the like.

The first registration content is a mapping relationship between the first data storage metadata obtained after the treatment and the corresponding first medical record model, and a sample of the first registration content is shown in table 4 for example.

Table 4:

And carrying out data management on the data in the clinical scientific research model by using adcutil 2.0.0. Based on a clinical scientific model, the historical clinical scientific medical records and the in-hospital scientific medical records are placed in a library, and the storage structure is a library (dsj), wherein a historical clinical raw data table (src), a historical post-treatment data table (nosrc), an in-hospital clinical raw data table (src_inh) and an in-hospital post-treatment data table (nosrc _inh) are established based on clinical businesses. The first registration content is also the mapping relation between the first data storage metadata obtained after the treatment and the corresponding first medical record model, and the sample of the first registration content is shown in table 5 for example.

Table 5:

And carrying out data management on the data in the clinical scientific research model by using adcutil 3.0.0. Based on the clinical scientific model, the historical clinical scientific medical records and the in-hospital scientific medical records are placed in a library, and the storage structure is a library (dsj), wherein a historical clinical data table (inhistory) and an in-hospital clinical data table (inhospital) are established based on clinical businesses. The first registration content is also the mapping relation between the first data storage metadata obtained after the treatment and the corresponding first medical record model, and the sample of the first registration content is shown in table 6 for example.

Table 6:

the dataset module 12 has therein a first dataset corresponding to the first medical record model of the diversification in the medical record module 10. For example, the clinical scientific model corresponds to a clinical scientific data set, the patient representation model corresponds to a patient representation data set, and the monitoring model corresponds to a monitoring data set.

The first data set corresponding to the first medical record model is recorded with a first medical record model, a data management tool version number for managing the data of the first medical record model and an index name of an index constructed for the first data storage metadata of the data in the first medical record model after the data is managed, wherein the index name is unique and similar to a database name of a database.

Taking a clinical scientific data set as an example, examples thereof are as follows:

data governance tool version number adcutil 3.0.0

First medical record model 1.0 of clinical scientific research model

Index name dsj _index

In one embodiment of the invention, the constructed index is an inverted index. The inverted index is an efficient index structure for storing association between documents and keywords, including document ID (DocId) and word frequency information (TF) of the keywords. Similarity between documents and queries is calculated by applying TF-IDF, BM25 or other term weight calculation methods, and ranked according to these similarities. It should be noted that the term weight calculation is performed dynamically at the time of the query.

Based on the first medical record model, an inverted index is constructed for the first data storage metadata subjected to data management, so that the first medical record model of the medical record model 10 can be covered, and fine-grained multidimensional semantic indexes are provided for data contents in different first medical record models.

When the inverted index is constructed, index creation sentences are generated and executed aiming at different semantic features, so that a comprehensive inverted index system is constructed. The Analyzer component is responsible for executing word segmentation operation on the text, and splits the text into token/term. These individual words or phrases resulting from the word segmentation operation are then stored in an inverted index on disk.

Here, taking the department field and the complaint field in the clinical scientific model in the above example one as an example, the department field is "binganshouye. The Analyzer component uses specific delimiters (e.g., default spaces, periods, word segmentation operations, etc.) to segment the text into tokens, and applies specific filters to each token. After this analysis, the token is converted to terms and these terms are stored in an inverted list for the corresponding fields. The final inverted list contains term corresponding to all fields in the clinical scientific model, and the inverted list has unique identification id, namely index name.

Note that token is a separate entry record and contains information such as its position, length, etc. in the corresponding field, term generally refers to the actual term stored in the index, which is the value directly stored and queried without analysis (tokenization) or word segmentation.

In the invention, taking a clinical scientific research data set as an example, a registration path obtained after data management is carried out on a clinical scientific research model 1.0 by a data management tool adcutil 3.0.0 and registration is carried out in a data registration center is adcutil 3.0.0/CLINICALRESEARCH 1.0.

As shown in fig. 2, the present invention is based on the content in the first, second and third examples, such as data from a clinical data source, an internet of things data source and a sanitary resource data source, under the clinical theme and the internet of things theme, a corresponding clinical scientific research model, a patient portrait model and a monitoring model are built for each type of first medical record data, the first medical record data in the clinical scientific research model, the patient portrait model and the monitoring model are managed based on different versions of first data management tools, then registered in a data registration center, the first data storage metadata obtained after management is stored in a corresponding database to obtain a mixed database, an inverted index is built for the first data storage metadata based on the constructed first medical record model, and a corresponding first data set is built based on different first medical record models to obtain a clinical scientific research data set, a patient portrait data set and a monitoring data set.

When a user initiates a query request, the query module 13 acquires query information in the query request, constructs query parameters based on the specification of the first medical record model, then determines that a second data set is involved from a first data set constructed in advance based on the query parameters, and if a plurality of second data sets exist, the query conditions for destroying the data structure in the query request need to be intercepted. After the one or more second data sets are finally determined, for the case of one second data set, a mapping relationship between the index names and the data sets is established based on the index names in the second data set. For the case of multiple second data sets, the same index alias is created for the index under each second data set, and a mapping relationship between the index alias and the second data set is established.

By creating a concise and clear alias for each index involved, the writing of query statements can be simplified, and unified management and maintenance of the indexes in the multiple second data sets involved is facilitated. Meanwhile, the index alias can be utilized in the subsequent query process, and the indexes can be referenced and operated more conveniently.

The mapping relationship between the index name (index alias) and the second data set constructed in the query module 13 can determine the corresponding second data set according to the mapping relationship, and based on the version number of the data management tool and the second medical record model contained in the second data set, can determine the storage path of the metadata of the second data storage after the management of the second data management tool based on the second medical record model in the data registry. Therefore, through the mapping relation between the index alias and the second data set, the storage position and the data format of the specific data corresponding to each index can be accurately indicated.

Taking a query parameter of a clinical scientific model as an example, a query parameter sample based on the clinical scientific model is:

{

"expressions": [

[{

"field": "medical records front page_department",

"values": [

'Cardiology department'

],

"Exp": "equals"

},{

"Field": "Admission record_complaint",

"values": [

Chest pain "

],

"Exp": "contains"

}

] ],

"size": 10,"page": "0",

"fields": [

[{

"Field": "inspection report_inspection item name"

}

],

[{

"Field": "inspection report_inspection item"

}

] ]}

The query module 13 is further configured to convert the specific content based on the query condition into a DSL statement, where the DSL statement includes a range query and a matching query.

For example, the query request may include a query condition that is to find all patient medical records that are between 30 and 50 years of age and that have a particular condition. Then this query condition is converted to a DSL statement involving a combination of a range query (for the age field) and a matching query (for the condition description field), such as:

"{"query": { "bool": { "must": [ { "range": { "age": { "gte": 30, "lte": 50} } }, { "match": { "symptom_description": " Certain specific disorder "} } } }" and "

After converting the query condition into the DSL statement, the query module 13 acquires the medical record data id satisfying the query condition by executing the DSL statement in the index.

It should be noted that, the medical record data exists in the form of documents, and each medical record document has a unique identification id, and the unique identification id is not only the medical record data id but also the index id, so that DSL statements are executed in the index, and the medical record data id meeting the query condition can be obtained.

After obtaining the medical record data id, the query module 13 groups according to the corresponding index to obtain a corresponding data group.

It should be noted that, different indexes correspond to different storage and data formats, and medical record data ids with the same index are grouped together, so that the address and data format of the second medical record data storage can be queried according to the mapping relation between the index name (index alias) and the second data set. And combining the data obtained after processing the different data packets in a heterogeneous combination mode.

For example, if there are two indexes, one for the medical records of the inpatient and the other for the medical records of the outpatient, the acquired medical record data ids are grouped according to the two indexes, and then subsequent query and processing can be performed on the medical record data of the inpatient and the outpatient respectively.

After obtaining the data packet, the query module 13 queries the second data set corresponding to the medical record data id by using the mapping relationship between the index name (index alias) and the second data set for different data packets, determines the second registration content corresponding to the second data set in the data registration center based on the path in the second data set, and converts the index id corresponding to the medical record data id into the medical record storage id based on the second registration content and the id generation rule.

The medical record storage id is an id of data storage in the database.

The registration content of the first data storage metadata after the data management is carried out on the basis of the first data management tools with different versions is also different, and the medical record storage ids of the different versions after conversion are also different. For example:

When the version number of the first data administration tool is adcutil 1.0.0, the medical record data id is local #2#000000A16100#3, and the medical record data id is local #000000A16100#3, so that the intermediate outpatient/hospitalization identification is removed.

When the version number of the first data administration tool is adcutil 2.0.0, the medical record data id is local #2#000000A16100#3, and the medical record data id is local #000000A16100#3# binganshouye, so that the intermediate outpatient/inpatient identification is removed, and meanwhile, a specific document name is added.

When the version number of the first data management tool is adcutil.0, the medical record data id is ff7eLOCAL # #2# # T001081004-h9#8, and is converted into the medical record storage id of 76aaLOCAL# # T001081004-h9#8#2# # binganshouye# # T001081004-h9@1, the first four bits of the medical record data id are deleted, the first four bits of the medical record storage id are recalculated and generated, and meanwhile, the document identification and the unique identification of each document data are added.

The query module 13 is further configured to determine a third dataset mapped by the index id in the second dataset based on the index id and a mapping relationship between the index name (index alias) and the second dataset, determine corresponding third data storage metadata through a third data management tool and a third medical record model in the third dataset, determine corresponding third registration content in the data registration center based on the third data storage metadata, automatically interpret and identify data of a required field for the third registration content according to a query condition, and splice the obtained medical record data query field with the medical record storage id to obtain a final query condition.

For example, if fields such as a test item and a test value of a patient are to be queried, according to the storage address and the data format, the medical record data id of the patient is combined, and the final query conditions are as follows:

"db.getCollection('jianyanbaogao').find({"_id":"BJDXDSYY#000527478900#4"},{"jianyanbaogao.lab_report.lab_sub_item_name":1,"jianyanbaogao.lab_report.lab_qual_result":1})".

The query module 13 executes the final query condition by connecting the databases, obtains the required third medical record data from the corresponding databases, and if the third medical record data with different formats are obtained, performs heterogeneous merging on the obtained third medical record data with different formats to obtain third medical record data with uniform formats, thereby forming medical records.

It should be noted that, the queried medical record data may come from different databases, and the databases may be non-relational databases or other types of storage systems for storing the medical record data, so that medical record data with the same data format needs to be obtained through heterogeneous merging and returned to form medical records, so as to ensure consistency of the medical record data, and facilitate subsequent use and analysis by users.

It should be noted that, the first medical record data, the second medical record data and the third medical record data refer to medical record data, the "first", "second" and "third" are only used for distinguishing data ranges, and in the present application, the third medical record data is included in the second medical record data, and the second medical record data is included in the first medical record data. Other things that are distinguished by "first", "second" and "third" are inclusion relationships.

And combining the data packets, and converting according to the corresponding relation between the fields and the data formats required by the product calling interfaces when heterogeneous combination is performed because the data versions required by the products are inconsistent. For example, adcutil 3.0.0 of the repeatable documents are stored one by one, adcutil 1.0.0 and adcutil 2.0.0 are stored in one block, and the data packets are combined to merge the documents one by heterogeneous merging.

In the processing architecture of medical record mixed data disclosed by the embodiment of the invention, for different first medical record models, data obtained after data treatment is carried out by using different versions of first data treatment tools are mixed and inverted to form a first data set, the first data set is used as a query basis, in the query process, query conditions in a query request are processed based on the first data set to generate unified DSL statement execution query, and different data structures and storages can be adapted and considered, so that the aim of improving the working efficiency of query under any unstructured medical record mixed data structure is fulfilled.

As shown in fig. 3, a flowchart of a method for processing medical record mixed data according to an embodiment of the present invention mainly includes:

s301, acquiring a query request, and determining a second data set from a first data set constructed in advance based on the query request.

In step S301, the query request includes query conditions and query parameters constructed based on a first medical record model, where the first medical record model is constructed by different types of first medical record data under various medical topics, and each medical record model corresponds to one type of first medical record data, and examples of specific first medical record models can be referred to in examples one to three.

In step S301, the process of constructing the first data set in advance may be involved in the process of constructing the data set by the data set module 12 in fig. 1 described above.

In the process of specifically executing S301, by analyzing parameters such as a field range and a data source related to the query parameter in the query request, a second data set meeting the requirement of the query parameter is determined from the first data set constructed in advance. The first data set comprises the second data set.

S302, judging whether a plurality of second data sets exist, if not, executing S303, and if so, executing S304.

And S303, establishing a mapping relation between the index name and the second data set based on each index in the second data set.

S304 creating the same index alias for the indexes under the plurality of second data sets.

In one embodiment of the invention, after determining that the plurality of second data sets exists, executing S304 before creating the same index alias further comprises intercepting a query condition in the query request that corrupts a data structure of second medical record data in the plurality of second data sets. Based on this, the query conditions used subsequently are those subjected to the interception processing.

And S305, establishing a mapping relation between the index aliases and a plurality of second data sets.

The mapping relationships obtained after S303 or S305 are each used to indicate the storage location and data format of the second medical record data in the second data set corresponding to the index.

S306, converting the query condition in the query request into DSL statement.

In S306, the DSL statements include at least a range query statement and a match query statement, i.e., by converting the query conditions into DSL statements conforming to DSL syntax, DSL statements relating to a combination of range query and match query can be obtained.

S307, inquiring in the index based on the DSL statement, determining the medical record data id, and grouping the medical record data id according to the corresponding index to obtain a data group.

And S308, converting index ids corresponding to the medical record data ids into medical record storage ids by using a mapping relation for each data packet.

In S308, the mapping relationship here is indicated as a mapping relationship between the index name and the second data set if S303 is executed, and indicated as a mapping relationship between the index alias and the second data set if S305 is executed.

In the process of executing S308 specifically, for each data packet, a corresponding third data set can be quickly determined according to the index name or the index alias through the mapping relationship, and the address and the data format of the data storage can be determined based on the content in the third data set, so that the accurate medical record data can be conveniently obtained by subsequent query.

It should be noted that the third data set is included in the second data set.

S309, acquiring third registration content pre-stored in the data registration center according to the index id, and constructing a final query condition based on the third registration content and the medical record storage id.

In the specific execution S309, determining a corresponding index according to the index id, determining a third data set corresponding to the index id from the second data set according to the mapping relationship between the index name of the index and the second data set, obtaining a registration path of a third data administration tool and a third medical record model in the data registration center in the third data set based on the third data set and the inverted index, determining a third registration content according to the registration path, automatically identifying and querying a corresponding database and table based on the third registration content, automatically analyzing, identifying and obtaining a storage position and a data format of third medical record data meeting query conditions, and splicing the storage position, the data format and the medical record storage id of the third medical record data to form a final query condition.

It should be noted that, the foregoing S308 and S309 are performed for each data packet, and the subsequent S310 performs heterogeneous merging on the obtained result.

And S310, inquiring a database storing medical record data based on the final inquiring condition, and merging the inquired third medical record data to obtain the final medical record.

In the process of executing S310, if different formats exist for the queried third medical record data, converting the third medical record data with different formats into third medical record data with the same format through a heterogeneous merging mode, and merging to finally obtain a medical record containing the third medical record data with the uniform data format, namely a final medical record.

In the embodiment of the invention, unified data format standards can be preset, and the acquired third medical record data with different data formats can be automatically converted.

For example, the partial data format test report is a test report, the partial data divides the test report into a test report detail fusion and a test report main table fusion, and the two documents are fused together to obtain the data of the test report.

Therefore, the third medical record data with different data formats is automatically converted into the unified data format, so that the consistency of the medical record data is ensured, and the follow-up user can use and analyze conveniently. And meanwhile, the third medical record data is convenient to transfer and share among different hospital systems or modules.

In the processing method of medical record mixed data disclosed by the embodiment of the invention, the first data set constructed by mixing inverted indexes based on the data of different first medical record models treated by data is used as a query basis, in the query process, the query condition in the query request is processed based on the first data set to generate unified DSL statement execution query, and different data structures and storages can be adapted and considered, so that the aim of improving the working efficiency of query under any unstructured medical record mixed data structure is fulfilled.

In one embodiment of the present invention, the process of pre-constructing the first data set includes:

S11, acquiring different types of first medical record data under each medical theme, and constructing a corresponding medical record model according to each type of first medical record data under each medical theme.

S12, managing the first medical record data in the first medical record model by using first data management tools of different versions to generate first data storage metadata of the mixed data structure.

And S13, registering the first data storage metadata in a data registration center to generate first registration content.

In S13, the data registration center includes at least a first level and a second level, the first level includes a first data governance tool, the first data governance tool is a first data governance tool with a different version, the second level includes a first data storage metadata, the first data storage metadata corresponds to a version of the first data governance tool used in governance, and a specific registration process may refer to a registration process performed by the digital registration center 12 in fig. 1.

S14, constructing an inverted index by using the first medical record model, the first data storage metadata and the first registration content, and generating different first data sets divided according to data sources based on the inverted index.

In S14, the data sources include different first data sets established according to different data centers when different data centers of the same hospital use the same data administration tool version and first medical record model, or different first data sets established according to different hospitals when different hospitals of a region use the same data administration tool version and first medical record model.

In the invention, a first data set is constructed in advance, namely a mixed inverted index system is constructed, and a registration path of a first data management tool in a data registration center after data management based on a first medical record model can be obtained based on the first data set in a subsequent query process by constructing the first data set.

In an embodiment of the present invention, the specific process of executing S308 to convert the index id corresponding to the medical record data id into the medical record storage id by using the mapping relationship for each data packet includes:

S21, obtaining an index id corresponding to the medical record data id.

Medical record data treated by the first data treatment tools with different versions or medical record data with different data formats are different in medical record data id used in storage, so that S21 is executed to determine an index id corresponding to the medical record data id, so that a corresponding third data set is determined according to an index name or an index alias by using an inverted index later, and then the storage position and the data format of the third medical record data are determined based on the third data set.

And S22, determining a third data set corresponding to the index id based on the mapping relation for each data packet.

In S22, a third data set corresponding to the index id is determined in the second data set based on the mapping relationship, the third data set being included in the second data set.

S23, obtaining third registration content of third medical record data in a data registration center in a third data set, and determining a corresponding id generation rule according to the third registration content.

In S23, the third registration content includes, but is not limited to, a field of the third medical record model, a database name, a table, a data format for storing the third data storage metadata, and a concatenation rule of ids, that is, a generation rule of ids can be obtained through the registration content, which can be specifically referred to the above examples one to three.

S24, converting the index id into a medical record storage id according to an id generation rule.

In the process of executing S24, the index id corresponding to the third dataset is converted into the medical record storage id based on the id generation rule in the third medical record model determined by the third registration content.

In an embodiment of the present invention, the step of executing S309 to obtain the third registration content pre-stored in the data registration center according to the index id, and constructing the final query condition based on the third registration content and the medical record storage id includes:

and S31, determining a corresponding third data set based on the index id and the mapping relation.

And S32, determining corresponding third data storage metadata based on a third data management tool and a third medical record model in the third data set.

And S33, determining third registration content based on the third data storage metadata in the data registration center.

In the process of executing S31 to S33, a registration path of the third medical record data corresponding to the third data set is acquired based on the index id, and then third registration contents of the third data storage metadata obtained after data management in the third data set in the data registration center are determined according to the registration path.

The above S31 to S33 may be performed in combination with the above S23, or may be performed in parallel. In the actual application process, after the third registration content is obtained in the process of executing S23, execution of S34 and S24 may be branched.

And S34, identifying third medical record data meeting the query condition based on the third registration content.

In the specific execution of S34, automatic identification and analysis are performed based on the database name, table, data format, id generation rule, etc. stored in the third registration content, so as to obtain the third medical record storage location and data format that meet the query condition.

And S35, splicing the storage position, the data format and the medical record storage id of the third medical record data to obtain the final query condition.

In an embodiment of the present invention, the step S310 of querying the database storing the first medical record data based on the final query condition, and merging the queried third medical record data to obtain the final medical record includes:

s41, inquiring a database storing third medical record data based on the final inquiring condition to obtain corresponding third medical record data.

S42, converting the data format of the third medical record data to obtain the third medical record data which accords with the preset standard data format.

And S43, merging the third medical record data which accords with the preset standard data format to obtain a final medical record.

Note that, in the merging in S310, heterogeneous merging is performed in combination with the case of the data packet.

In the invention, the storage modes of the first data storage metadata generated after the data is processed by the first data processing tools with different versions are possibly different, so that the stored data formats are different, and therefore, unified format conversion is carried out on the acquired third medical record data with different data formats, the data consistency is determined, and convenience is provided for subsequent use and analysis.

Based on the above method for processing medical record mixed data disclosed in the embodiment of the present invention, the present invention also correspondingly discloses a device for processing medical record mixed data, as shown in fig. 4, where the processing device includes a first acquisition unit 401, a first establishment unit 402, a first conversion unit 403, a first query unit 404, a second conversion unit 405, a second acquisition unit 406, and a second query unit 407.

The first obtaining unit 401 is configured to obtain a query request, and determine, based on the query request, a second data set from a first data set that is previously constructed, where the query request includes a query condition and a query parameter that is constructed based on a first medical record model, where the first medical record model is constructed from different types of first medical record data under various medical topics, and each medical record model corresponds to one type of first medical record data.

The first establishing unit 402 is configured to establish, based on each index in the second data set, a mapping relationship between the index name and the second data set, where the mapping relationship is used to indicate a storage location and a data format of the second medical record data in the second data set corresponding to the index.

A first converting unit 403, configured to convert the query condition in the query request into a DSL statement, where the DSL statement includes at least a range query statement and a matching query statement.

The first query unit 404 is configured to query in the index based on the DSL statement, determine the medical record data ids, and group the medical record data ids according to the corresponding index to obtain a data group.

A second converting unit 405, configured to convert, for each data packet, an index id corresponding to the medical record data id into a medical record storage id by using a mapping relationship.

The second obtaining unit 406 is configured to obtain third registration content pre-stored in the data registration center according to the index id, and construct a final query condition based on the third registration content and the medical record storage id.

The second query unit 407 is configured to query a database storing the third medical record data based on the final query condition, and combine the queried third medical record data to obtain a final medical record.

In an embodiment of the invention, the processing device further comprises a presetting unit, wherein the presetting unit is specifically used for acquiring different types of first medical record data under each medical topic, constructing a corresponding first medical record model according to each type of first medical record data under each medical topic, managing the first medical record data in the first medical record model by using different versions of first data management tools, generating first data storage metadata of a corresponding version of mixed data structure, registering the first data storage metadata in a data registration center, generating first registration content, wherein the data registration center at least comprises a first level and a second level, the first level comprises the first data management tools, the second level comprises the first data storage metadata, constructing an inverted index by utilizing the first medical record model, the first data storage metadata and the first registration content, generating different first data sets which are divided according to data sources, wherein the data sources comprise different data centers of the same hospital or different data sets of the same hospital or different versions of the first data centers of the same hospital or different data sets which are different according to the different data management tools of the first medical record models.

In an embodiment of the present invention, the first establishing unit 402 is further configured to create the same index alias for indexes under the plurality of second data sets if the plurality of second data sets exist. A mapping relationship between the index alias and the plurality of second data sets is established.

In an embodiment of the present invention, the first establishing unit 402 is further configured to intercept a query condition in the query request that destroys a data structure of the second medical record data in the plurality of second data sets after determining that the plurality of second data sets exist.

In an embodiment of the present invention, the second converting unit 405 is specifically configured to obtain, for each data packet, an index id corresponding to a medical record data id, determine, based on a mapping relationship, a third data set corresponding to the index id from the second data set, obtain third registration content of third medical record data in the third data set in the data registration center, determine a corresponding id generation rule according to the third registration content, and convert the index id into a medical record storage id according to the id generation rule.

In an embodiment of the present invention, the second obtaining unit 406 is specifically configured to determine, for each data packet, a corresponding third data set based on the index id and the mapping relation, determine, based on a third data administration tool and a third medical record model in the third data set, corresponding third data storage metadata, determine, in the data registration center, third registration content based on the third data storage metadata, identify, based on the third registration content, a storage location and a data format of third medical record data that satisfy the query condition, and splice the storage location, the data format, and the medical record storage id of the third medical record data to obtain the final query condition.

In an embodiment of the present invention, the second query unit 407 is specifically configured to query, for each data packet, a database storing third medical record data based on a medical record data query field to obtain corresponding third medical record data, convert a data format of the third medical record data to obtain third medical record data according to a preset standard data format, and combine the third medical record data according to the preset standard data format to obtain a final medical record.

It should be noted that the processing device for medical record mixed data disclosed in the embodiment of the present invention may be specifically applied to the processing architecture for medical record mixed data disclosed in fig. 1.

In the processing device for medical record mixed data disclosed by the embodiment of the invention, the first data set constructed by mixing inverted indexes based on the data of different first medical record models treated by data is used as a query basis, in the query process, the query condition in the query request is processed based on the first data set to generate unified DSL statement execution query, and different data structures and storages can be adapted and considered, so that the aim of improving the working efficiency of query under any unstructured medical record mixed data structure is fulfilled.

Based on the processing device for medical record mixed data disclosed by the embodiment of the disclosure, each module can be realized by a hardware device composed of a processor and a memory. Specifically, the above modules are stored in a memory as program units, and the processor executes the program units stored in the memory to realize thread control.

The processor comprises a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and database capacity expansion is realized by adjusting kernel parameters.

The embodiment of the invention discloses a computer storage medium, and a program is stored on the computer storage medium, and when the program is executed by a processor, the processing method of medical record mixed data disclosed by the embodiment of the invention is realized.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

The embodiment of the invention discloses an electronic device which comprises a memory, a processor and a program stored on the memory, wherein the processor executes the program to realize the processing method of medical record mixed data disclosed by the embodiment of the invention.

Specifically, as shown in fig. 5, a schematic structural diagram of an electronic device according to an embodiment of the present invention is disclosed. The electronic device 500 comprises at least one processor 501, and at least one memory 502 connected to the processor, and a bus 503.

The processor 501 and the memory 502 communicate with each other via a bus 503.

The processor 501 is configured to execute a program stored in the memory.

The memory 502 is configured to store a program, where the program is at least configured to implement a method for processing medical record mixed data disclosed in the above embodiment of the present invention.

In one typical configuration, the device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip. Memory is an example of a computer-readable medium.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for processing medical record blending data, the method comprising:

2. The method of processing according to claim 1, wherein the pre-constructing the first data set comprises:

3. The method of processing according to claim 1, further comprising:

4. A processing method according to claim 3, further comprising, after determining that there are a plurality of second data sets:

5. The processing method according to any one of claims 1 to 4, characterized in that converting an index id corresponding to the medical record data id into a medical record storage id using the mapping relation, includes:

acquiring an index id corresponding to the medical record data id;

6. The processing method according to any one of claims 1 to 4, wherein acquiring third registered content stored in advance in a data registration center from the index id, and constructing a final query condition based on the third registered content and the medical record storage id, comprises:

7. The processing method according to any one of claims 1 to 4, wherein querying a database storing third medical record data based on the final query condition, merging the queried third medical record data to obtain a final medical record, includes:

8. A processing device for medical record blending data, the processing device comprising:

9. A computer storage medium having a program stored thereon, wherein the program, when executed by a processor, implements the method of processing medical record blending data according to any one of claims 1 to 7.

10. An electronic device comprising a memory, a processor and a program stored on the memory, wherein the processor executes the program to implement the method of processing medical record blending data as claimed in any one of claims 1 to 7.