[go: up one dir, main page]

CN109522331B - Individual-centered regionalized multi-dimensional health data processing method and medium - Google Patents

Individual-centered regionalized multi-dimensional health data processing method and medium Download PDF

Info

Publication number
CN109522331B
CN109522331B CN201811203501.4A CN201811203501A CN109522331B CN 109522331 B CN109522331 B CN 109522331B CN 201811203501 A CN201811203501 A CN 201811203501A CN 109522331 B CN109522331 B CN 109522331B
Authority
CN
China
Prior art keywords
data
main
health
personal
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811203501.4A
Other languages
Chinese (zh)
Other versions
CN109522331A (en
Inventor
金以东
李雪莉
周大胜
王语莫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ebaonet Healthcare Information Technology Beijing Co ltd
Original Assignee
Ebaonet Healthcare Information Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ebaonet Healthcare Information Technology Beijing Co ltd filed Critical Ebaonet Healthcare Information Technology Beijing Co ltd
Priority to CN201811203501.4A priority Critical patent/CN109522331B/en
Publication of CN109522331A publication Critical patent/CN109522331A/en
Application granted granted Critical
Publication of CN109522331B publication Critical patent/CN109522331B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a regional multi-dimensional health data processing method and medium taking an individual as a center, wherein the method comprises the following steps: carrying out duplicate removal processing on health data acquired from different data fields to obtain pure health data; carrying out standardized processing on the pure health data to obtain unified description health data; merging the uniformly described health data into data storage distinguished by data fields to obtain a personal information table; clustering the main identification data strips in different data warehouses to obtain main clustering data strips; calculating a first weight of each piece of data in the main clustering data strip according to a multi-domain personal data normalization weight analysis formula; normalizing the main clustering data strips according to the first weight and the first threshold to obtain main personal data strips; generating a master index number for the master personal data strip; and storing the main personal data strip in the global main index table according to the main index number. The invention realizes the cross validation of health data of different data fields and can dynamically and comprehensively reflect the personal health problems.

Description

Individual-centered regionalized multi-dimensional health data processing method and medium
Technical Field
The invention relates to the technical field of computers, in particular to an integration method of regional multi-dimensional health data, and specifically relates to a method and a medium for processing the regional multi-dimensional health data by taking an individual as a center.
Background
In recent years, with the rapid development of big data and cloud platforms and the increasing importance of people on their health conditions, the value of big health data is increasingly shown, and the quantity of health data is rapidly expanded at a geometric-level increase speed. However, the value of the health data can be really played only by carrying out classification processing and orderly integration on massive medical data. In the prior art, data integration methods are established based on a certain data source (for example, data inside a hospital), and data integration is mainly performed by using departments or disease types as main lines, so that interconnection and intercommunication between different data sources and different data types in a human unit cannot be realized. Therefore, data association is performed on data in the hospital by taking an individual as a unit, and the internal data of each medical institution can only reflect the condition of one or a few times of treatment of the individual, and cannot completely, dynamically and comprehensively reflect the health problem of the individual.
In addition, the existing personal data association technology is mainly used for association through identity cards, driving licenses, military officer licenses, passports, medical insurance card numbers and the like, or identification through names, sexes and birth years and months. The data processing method has the defects of single processing logic and inflexibility, and the credibility of the personal basic information cannot be verified and determined. If the name is a nickname or the privacy removal processing is performed by a system or the individual data is wrong, the personal data normalization processing can never be performed on the problem data, so that the value of the health data is greatly reduced.
Therefore, there is a need for developing a method for integrating and processing health data in units of individuals, which can normalize the health data and improve the application value of the health data when the field of the health data is incorrect.
Disclosure of Invention
In view of the above, the technical problem to be solved by the present invention is to provide a method and a medium for processing personal-centered regional multidimensional health data, which solve the problems that the prior art cannot integrate and process different types of health data from different data sources and cannot process partial field error health data.
In order to solve the above technical problem, a specific embodiment of the present invention provides a method for processing personal-centered regional multidimensional health data, including: according to the integrity rule, carrying out deduplication processing on health data acquired from different data domains to obtain pure health data; carrying out standardization processing on the pure health data to obtain unified description health data; merging the unified description health data into data storage distinguished by data fields to obtain a personal information table, wherein the personal information table comprises a main identification data strip and a non-main identification data strip; clustering the main identification data strips in different data warehouses to obtain main clustering data strips; calculating a first weight of each piece of data in the main clustering data strip according to a multi-domain personal data normalization weight analysis formula; normalizing the main clustering data strips according to the first weight and a first threshold to obtain main personal data strips; generating a master index number for the master personal data strip; and storing the main personal data strip in a global main index table according to the main index number.
Embodiments of the present invention also provide a computer storage medium containing computer-executable instructions that, when processed by a data processing device, perform a method for personal-centric regionalized multi-dimensional health data processing.
According to the above embodiments of the present invention, the method and medium for processing personal-centered regional multidimensional health data have at least the following advantages: the method not only transversely compares the health data in each data domain (data source), but also longitudinally compares the health data across the data domains, so that the cross validation of the health data is realized, and the reliability of a main personal data strip stored in a global main index table is ensured; the weight of the data domain can be configured according to the requirement, and the weight of the specific field of the health data can also be configured, so that the data flexibility is high; in the data matching process, the accurate matching and the fuzzy matching can be selected and used according to the characteristics of the non-main identification data strip, and the success rate of health data normalization is improved on the basis of ensuring the reliability of the health data; the optimal non-main identification data of different types of different data fields are compared and analyzed with the main index table, so that the problem of normalization of the personal health data of different data fields is effectively solved; the problem of normalized matching of health data under the condition of nickname, privacy and even source health data errors is solved through a multi-domain personal data normalization weight analysis formula.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a flowchart of a first embodiment of a method for processing personal-centric regionalized multidimensional health data according to an embodiment of the present invention.
Fig. 2 is a flowchart of a second embodiment of a personal-centric regionalized multidimensional health data processing method according to an embodiment of the present invention.
Fig. 3 is a flowchart of a third embodiment of a method for processing personal-centric regionalized multidimensional health data according to an embodiment of the present invention.
Fig. 4 is a flowchart of a fourth embodiment of a method for processing personal-centric regionalized multidimensional health data according to the embodiment of the present invention.
FIG. 5 is a schematic diagram of a method for periodically collecting health data from different data fields in batches using a button tool according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of cross-indexing a personal information table and a global main index table according to an embodiment of the present invention.
Detailed Description
For the purpose of promoting a clear understanding of the objects, aspects and advantages of the embodiments of the invention, reference will now be made to the drawings and detailed description, wherein there are shown in the drawings and described in detail, various modifications of the embodiments described herein, and other embodiments of the invention will be apparent to those skilled in the art.
The exemplary embodiments of the present invention and the description thereof are provided to explain the present invention and not to limit the present invention. Additionally, the same or similar numbered elements/components used in the drawings and the embodiments are used to represent the same or similar parts.
As used herein, the terms "first," "second," …, etc., do not denote any order or sequence, nor are they used to limit the present invention, but rather are used to distinguish one element from another or from another element or operation described in the same technical language.
With respect to directional terminology used herein, for example: up, down, left, right, front or rear, etc., are simply directions with reference to the drawings. Accordingly, the directional terminology used is intended to be illustrative and is not intended to be limiting of the present teachings.
As used herein, the terms "comprising," "including," "having," "containing," and the like are open-ended terms that mean including, but not limited to.
As used herein, "and/or" includes any and all combinations of the described items.
References to "plurality" herein include "two" and "more than two"; reference to "multiple sets" herein includes "two sets" and "more than two sets".
As used herein, the terms "substantially", "about" and the like are used to modify any slight variation in quantity or error that does not alter the nature of the variation. Generally, the range of slight variations or errors modified by such terms may be 20% in some embodiments, 10% in some embodiments, 5% in some embodiments, or other values. It should be understood by those skilled in the art that the aforementioned values can be adjusted according to actual needs, and are not limited thereto.
Fig. 1 is a flowchart of a first embodiment of a personal-centric regional multidimensional health data processing method according to an embodiment of the present invention, and as shown in fig. 1, collected health data is subjected to deduplication processing and then to normalization processing, so as to obtain uniformly described health data; then, the unified description health data are merged into a data warehouse to obtain a personal information table; clustering the main identification data strips in the personal information table, and calculating the weight of each piece of data in the main clustering data strips; then, normalizing the main clustering data strips according to the weight values to obtain main personal data strips; and finally, generating a main index number for the main personal data strip, and storing the main personal data strip into the global main index table according to the main index number.
In the embodiment shown in the figure, the personal-centric regionalized multidimensional health data processing method comprises the following steps:
step 101: and carrying out deduplication processing on the health data acquired from different data domains according to the integrity rules to obtain pure health data. In the embodiment of the invention, the integrity rule specifically means that the health data cannot lack a name, the lack of the name is regarded as no main data, and the health data is logically deleted.
Step 102: and carrying out standardized processing on the pure health data to obtain unified description health data. In an embodiment of the invention, the gender field from within the medical insurance domain, male is marked as "1"; the gender field from within the hospital domain, male labeled "B", is normalized and collectively described as the "male" output. Step 102 specifically includes: analyzing the massive health data of each data field to obtain a data identification rule of each data field; and carrying out normalized processing on the specific field data of the pure health data according to the data identification rule.
Step 103: and merging the unified description health data into a data warehouse distinguished by data fields to obtain a personal information table, wherein the personal information table comprises a main identification data strip and a non-main identification data strip. In the embodiment of the invention, each piece of data in the main identification data strip comprises a unique number, and the unique number comprises at least one of an identity card number, an officer's license number, a passport number and a medical insurance card number; each piece of data in the non-main identification data strip does not comprise a unique number. For example, the non-primary identification data strip includes: name, gender, year and month of birth, cell phone number, contact phone number, country, province, city, home address, and postal code. The data domain comprises a medical insurance domain, a hospital domain, a resident health record domain, a pharmacy domain, a health wearable device domain, a personal use behavior domain and the like. And (4) data warehousing modeling design is carried out in advance, and each data domain has corresponding data warehousing.
Wherein, the medical insurance domain data includes: medical insurance personal information data, personal medical insurance payment data, personal medical insurance card state data, a medical treatment settlement list and the like. The hospital domain data includes: the medical record management system comprises basic information data of individuals in a hospital, clinic prescription data, clinic medical record data, examination data, inspection data, operation anesthesia data, electrocardiogram data, hospitalization medical advice data, hospitalization medical record data, ICU (critical care unit) data and the like. Resident health record domain data: personal basic information data, health examination data, child health care data, pregnant and lying-in woman information data, chronic disease management data, infectious disease management data, mental disease management data and the like. Pharmacy domain data: personal basic information data, personal medicine purchase data, and the like. Health wearable device domain data: personal basic information data, blood glucose data, blood pressure data, heart rate data, blood oxygen content data, body temperature data, respiratory rate data, and the like. Personal use behavior field data: personal basic information data, and use behavior data such as inquiry, browsing and the like when the person authorizes to use the internet product.
Step 104: and clustering the main identification data strips in different data warehouses to obtain main clustering data strips. In the embodiment of the invention, the equal clustering can be performed in sequence according to the identity card number, the military and official license number, the passport number and the medical insurance card number. The main clustering data strips obtained by clustering according to the identity card number are shown in table 1 below, and table 1 is the main clustering data strips obtained by clustering according to the identity card number.
TABLE 1
Figure BDA0001830609530000071
Step 105: and calculating a first weight of each piece of data in the main clustering data strip according to a multi-domain personal data normalization weight analysis formula. In the embodiment of the invention, the multi-domain personal data normalization weight analysis formula specifically comprises:
Figure BDA0001830609530000072
wherein omegaiThe first weight of each piece of data in the main clustering data strip is obtained; w is aiIs a different numberWeight of data quality in the domain; f. ofi,jThe number of times of successful matching of certain specific field data of one piece of data in the main clustering data strip and the specific field data of other data strips in the main clustering data strip is obtained; n is the number of data in the main clustering data strip; u. ofjWeights for particular field data in different data fields; i is a data field number, wherein the data field comprises a medical insurance field, a hospital field, a resident health record field, a pharmacy field, a health wearable device field and a personal use behavior field.
According to Table 1 above, the data parameters in the medical insurance domain are as follows:
W110 (weight of medical insurance domain data quality);
f1,13 (the number of times of matching the medical insurance domain identity card number with the identity card number in other data domains is successful);
f1,22 (the number of times that the medical insurance domain name is successfully matched with the names in other data domains);
f1,33 (number of times the matching of the medical insurance domain gender with the gender in the other data fields is successful);
f1,43 (the number of times of successful matching between the date of birth year and month of the doctor insurance domain and the date of birth year and month of the other data fields);
f1,53 (the number of times of successful matching of the medical insurance domain address and the addresses in other data domains, fuzzy matching);
f1,63 (the number of times of successfully matching the mobile phone number in the medical insurance domain with the mobile phone number in the other data fields);
n-4 (total number of data fields);
u110 (weight of identification number field);
u28 (weight of name field);
u36 (weight of gender field);
u48 (weight of year, month, day of birth field);
u58 (the weight of the address field);
u68 (weight of the mobile number field).
The parameters can be set manually and freely according to the specific application scene condition. For example: if the data needs to be strictly normalized, the weight of fields such as name, birth year and month, sex, address, etc. needs to be enhanced. Otherwise, the weight of these fields is reduced.
Step 106: and carrying out normalization processing on the main clustering data strips according to the first weight and a first threshold value to obtain main personal data strips. In the embodiment of the invention, omega is calculated by an operational formula1The value of (1), namely the first weight of the traditional Chinese medicine domain-preserving data strip of the main clustering data strip; calculating omega by analogy in turn2、Ω3、Ω4And normalizing the main clustering data strip according to a preset first threshold value to obtain a main personal data strip, wherein the main personal data strip is the data strip of the same person. In this example, the first, second and third pieces of data are the same person data, and the fourth piece of data is different person data, so that the main personal data pieces are the first, second and third pieces of data.
Step 107: generating a master index number for the master personal data strip. In the embodiment of the invention, the main index number generation logic generates the main index number by performing SHA256 operation on the identity card number, the name, the birth year and month, the family address, the mobile phone number and the random number, and the significance of leading out the random number is to avoid the repetition of the main index number. SHA256 is one of the secure HASH algorithms, and calculates data to obtain a 64-bit HASH value.
Step 108: and storing the main personal data strip in a global main index table according to the main index number. In the embodiment of the invention, the generated main index number needs to be distributed to each piece of data in a main personal data strip at the same time, and the data strips are positioned in different data fields; the global master index table is associated with the master personal data strip in each data field by a master index number.
Referring to fig. 1, not only the health data in each data domain is compared horizontally, but also the health data can be compared longitudinally across the data domains, so that cross validation of the health data is realized, and the reliability of the main personal data strip stored in the global main index table is ensured; the weight of a data domain (data source) can be configured according to the requirement, and the weight of a specific field of health data can also be configured, so that the data flexibility is high; the problem of normalized matching of health data under the condition of nickname, privacy and even source health data errors is solved through a multi-domain personal data normalization weight analysis formula.
Fig. 2 is a flowchart of a second embodiment of a personal-centered regional multidimensional health data processing method according to a specific embodiment of the present invention, and as shown in fig. 2, after main personal data strips are stored in a global main index table according to a main index number, clustering non-main identification data strips to obtain non-main clustered data strips, and calculating a second weight of each piece of data in the non-main clustered data strips; selecting the optimal non-main identification data in each data warehouse according to the second weight; and finally, storing the optimal non-main identification data into a global main index table.
In the embodiment shown in the figure, after step 108, the method for processing the person-centered regionalized multi-dimensional health data further comprises:
step 109: and clustering the non-main identification data strips in different data warehouses to obtain non-main clustering data strips. In the embodiment of the invention, as long as one item in specific fields such as names, mobile phone numbers, family addresses and the like in the non-primary identification data bar is successfully matched, the non-primary identification data bar can be converted into the non-primary clustering data bar. The clustering treatment can be carried out in sequence according to the name, the mobile phone number, the telephone number and the home address. The non-main clustering data strips obtained by clustering according to the telephone numbers are shown in the following table 2, and the table 2 is the non-main clustering data strips obtained by clustering according to the telephone numbers. Table 2 shows that the number of the common non-primary clustered data is 4, the non-primary clustered data is from different data domains, the non-primary clustered data corresponding to the medical insurance domain, the hospital domain and the wearable device domain is the same person, and the non-primary clustered data corresponding to the user usage behavior domain is another person.
TABLE 2
Figure BDA0001830609530000101
Step 110: and calculating a second weight of each piece of data in the non-main clustering data strip according to a multi-domain personal data normalization weight analysis formula. In the embodiment of the present invention, the multi-domain personal data normalization weight analysis formula specifically includes:
Figure BDA0001830609530000111
wherein omegaiThe second weight value of each piece of data in the non-primary clustering data strip is obtained; w is aiWeights for data quality in different data domains; f. ofi,jThe number of times of successful matching of certain specific field data of one piece of data in the non-main clustering data strip with the specific field data of other data strips in the non-main clustering data strip; n is the number of data in the non-main clustering data strip; u. ofjWeights for particular field data in different data fields; i is a data field number, wherein the data field comprises a medical insurance field, a hospital field, a resident health record field, a pharmacy field, a health wearable device field, a personal use behavior field and the like.
Step 111: and selecting the optimal non-main identification data in each data warehouse according to the second weight. In the embodiment of the present invention, the obtained optimal non-primary identification data is shown in table 3 below, where table 3 is the selected optimal non-primary identification data.
TABLE 3
Figure BDA0001830609530000112
Step 112: and storing the optimal non-main identification data into the global main index table. In an embodiment of the present invention, step 112 specifically includes: matching the optimal non-main identification data with the universe main index table; if the matching is successful, storing the optimal non-main identification data into the global main index table according to the main index number; and if the matching fails, generating a new main index number, and storing the optimal non-main identification data into the global main index table according to the new main index number. For example, the optimal non-master identification data and the global master index table are matched, and the name + sex + year, month and day of birth "," mobile phone number + family address "or" mobile phone number + year, month and day of birth "can be matched, so long as one combination mode is successfully matched, the optimal non-master identification data is stored in the global master index table, and the optimal non-master identification data is stored in the global master index table; if all the combinations fail to be matched, the optimal non-main identification data does not exist in the global main index table, a new main index number is generated for the optimal non-main identification data, and the optimal non-main identification data is stored into the global main index table according to the new main index number.
Referring to fig. 2, the optimal non-primary identification data is compared and analyzed with the primary index table, so that the problem of normalization of personal health data of different types and different data domains is effectively solved; in the matching process, the accurate matching and the fuzzy matching can be selected and used according to the characteristics of the optimal non-main identification data, so that the requirements of a data user are met; the problem of normalized matching of health data under the condition of nickname, privacy and even source health data errors is solved through a multi-domain personal data normalization weight analysis formula.
Fig. 3 is a flowchart of a third embodiment of a method for processing personal-centric regionalized multidimensional health data according to an embodiment of the present invention; fig. 6 is a schematic diagram of cross-indexing the personal information table and the global main index table according to an embodiment of the present invention, and as shown in fig. 3 and fig. 6, the personal information table and the global main index table are cross-indexed by the main index number and the intra-domain number, so that the personal information tables of all data domains are associated with each other.
In the embodiment shown in the figure, after step 112, the method for processing the person-centered regionalized multi-dimensional health data further comprises:
step 113: and performing cross indexing on the personal information table and the universe main index table according to the main index number and the intra-domain number. In an embodiment of the present invention, the intra-domain number includes at least one of a personal number, a clinic number, a will number, a sample number, a settlement order number, and a resident health profile number.
Referring to fig. 3 and 6, the health data of all different data domains are associated through the main index number and the intra-domain number, all the health data related to the health data can be inquired through the main index number, and data search is convenient.
Fig. 4 is a flowchart of a fourth embodiment of a method for processing personal-centric regionalized multidimensional health data according to the embodiment of the present invention; fig. 5 is a schematic diagram of periodically collecting health data in batches from different data fields by using a key tool according to an embodiment of the present invention, as shown in fig. 4 and 5, in order to ensure privacy of health data, a private network line is used to periodically collect health data in batches from different data fields by using the key tool.
In the embodiment shown in the figure, before step 101, the method for processing the personal-centered regionalized multidimensional health data further comprises:
step 100: health data is periodically collected in batches from different data fields by a private network line by using a button tool. In an embodiment of the invention, a key tool is used to collect the full amount of health data in the current day from different data fields each day. A special light channel can be laid between the health data acquisition equipment and the data source, the keyboard tool completes data format conversion, and acquired health data are stored in the database.
Referring to fig. 4, the private network line is used for transmitting the health data collected by the button tool, so that the privacy of the health data is ensured, and the user experience is good; a special light channel is laid between the health data acquisition equipment and the data source, the data acquisition efficiency is high, and the normal operation of the medical service institution network cannot be influenced by the acquisition of massive health data.
The invention provides a computer storage medium containing computer executable instructions, wherein when the computer executable instructions are processed by a data processing device, the data processing device executes a regional multi-dimensional health data processing method taking an individual as a center. The method comprises the following steps:
step 101: and carrying out deduplication processing on the health data acquired from different data domains according to the integrity rules to obtain pure health data.
Step 102: and carrying out standardized processing on the pure health data to obtain unified description health data.
Step 103: and merging the unified description health data into a data warehouse distinguished by data fields to obtain a personal information table, wherein the personal information table comprises a main identification data strip and a non-main identification data strip.
Step 104: and clustering the main identification data strips in different data warehouses to obtain main clustering data strips.
Step 105: and calculating a first weight of each piece of data in the main clustering data strip according to a multi-domain personal data normalization weight analysis formula.
Step 106: and carrying out normalization processing on the main clustering data strips according to the first weight and a first threshold value to obtain main personal data strips.
Step 107: generating a master index number for the master personal data strip.
Step 108: and storing the main personal data strip in a global main index table according to the main index number.
The specific embodiment of the invention also provides a computer storage medium containing computer execution instructions, and when the computer execution instructions are processed by the data processing equipment, the data processing equipment executes the regional multidimensional health data processing method taking the individual as the center. The method comprises the following steps:
step 101: and carrying out deduplication processing on the health data acquired from different data domains according to the integrity rules to obtain pure health data.
Step 102: and carrying out standardized processing on the pure health data to obtain unified description health data.
Step 103: and merging the unified description health data into a data warehouse distinguished by data fields to obtain a personal information table, wherein the personal information table comprises a main identification data strip and a non-main identification data strip.
Step 104: and clustering the main identification data strips in different data warehouses to obtain main clustering data strips.
Step 105: and calculating a first weight of each piece of data in the main clustering data strip according to a multi-domain personal data normalization weight analysis formula.
Step 106: and carrying out normalization processing on the main clustering data strips according to the first weight and a first threshold value to obtain main personal data strips.
Step 107: generating a master index number for the master personal data strip.
Step 108: and storing the main personal data strip in a global main index table according to the main index number.
Step 109: and clustering the non-main identification data strips in different data warehouses to obtain non-main clustering data strips.
Step 110: and calculating a second weight of each piece of data in the non-main clustering data strip according to a multi-domain personal data normalization weight analysis formula.
Step 111: and selecting the optimal non-main identification data in each data warehouse according to the second weight.
Step 112: and storing the optimal non-main identification data into the global main index table.
The specific embodiment of the invention also provides a computer storage medium containing computer execution instructions, and when the computer execution instructions are processed by the data processing equipment, the data processing equipment executes the regional multidimensional health data processing method taking the individual as the center. The method comprises the following steps:
step 101: and carrying out deduplication processing on the health data acquired from different data domains according to the integrity rules to obtain pure health data.
Step 102: and carrying out standardized processing on the pure health data to obtain unified description health data.
Step 103: and merging the unified description health data into a data warehouse distinguished by data fields to obtain a personal information table, wherein the personal information table comprises a main identification data strip and a non-main identification data strip.
Step 104: and clustering the main identification data strips in different data warehouses to obtain main clustering data strips.
Step 105: and calculating a first weight of each piece of data in the main clustering data strip according to a multi-domain personal data normalization weight analysis formula.
Step 106: and carrying out normalization processing on the main clustering data strips according to the first weight and a first threshold value to obtain main personal data strips.
Step 107: generating a master index number for the master personal data strip.
Step 108: and storing the main personal data strip in a global main index table according to the main index number.
Step 109: and clustering the non-main identification data strips in different data warehouses to obtain non-main clustering data strips.
Step 110: and calculating a second weight of each piece of data in the non-main clustering data strip according to a multi-domain personal data normalization weight analysis formula.
Step 111: and selecting the optimal non-main identification data in each data warehouse according to the second weight.
Step 112: and storing the optimal non-main identification data into the global main index table.
Step 113: and performing cross indexing on the personal information table and the universe main index table according to the main index number and the intra-domain number.
The specific embodiment of the invention also provides a computer storage medium containing computer execution instructions, and when the computer execution instructions are processed by the data processing equipment, the data processing equipment executes the regional multidimensional health data processing method taking the individual as the center. The method comprises the following steps:
step 100: health data is periodically collected in batches from different data fields by a private network line by using a button tool.
Step 101: and carrying out deduplication processing on the health data acquired from different data domains according to the integrity rules to obtain pure health data.
Step 102: and carrying out standardized processing on the pure health data to obtain unified description health data.
Step 103: and merging the unified description health data into a data warehouse distinguished by data fields to obtain a personal information table, wherein the personal information table comprises a main identification data strip and a non-main identification data strip.
Step 104: and clustering the main identification data strips in different data warehouses to obtain main clustering data strips.
Step 105: and calculating a first weight of each piece of data in the main clustering data strip according to a multi-domain personal data normalization weight analysis formula.
Step 106: and carrying out normalization processing on the main clustering data strips according to the first weight and a first threshold value to obtain main personal data strips.
Step 107: generating a master index number for the master personal data strip.
Step 108: and storing the main personal data strip in a global main index table according to the main index number.
The specific embodiment of the invention provides a regional multi-dimensional health data processing method and medium taking an individual as a center, which not only transversely compare health data in each data domain (data source), but also longitudinally compare the health data across the data domains, realize cross validation of the health data and ensure the reliability of a main individual data strip stored in a global main index table; the weight of the data domain can be configured according to the requirement, and the weight of the specific field of the health data can also be configured, so that the data flexibility is high; in the data matching process, the accurate matching and the fuzzy matching can be selected and used according to the characteristics of the non-main identification data strip, and the success rate of health data normalization is improved on the basis of ensuring the reliability of the health data; the optimal non-main identification data of different types of different data fields are compared and analyzed with the main index table, so that the problem of normalization of the personal health data of different data fields is effectively solved; the problem of normalized matching of health data under the condition of nickname, privacy and even source health data errors is solved through a multi-domain personal data normalization weight analysis formula.
The embodiments of the invention described above may be implemented in various hardware, software code, or combinations of both. For example, an embodiment of the present invention may also be program code for executing the above method in a Digital Signal Processor (DSP). The invention may also relate to a variety of functions performed by a computer processor, digital signal processor, microprocessor, or Field Programmable Gate Array (FPGA). The processor described above may be configured according to the present invention to perform certain tasks by executing machine-readable software code or firmware code that defines certain methods disclosed herein. Software code or firmware code may be developed in different programming languages and in different formats or forms. Software code may also be compiled for different target platforms. However, the different code styles, types, and languages of software code and other types of configuration code that perform tasks in accordance with the present invention do not depart from the spirit and scope of the present invention.
The foregoing is merely an illustrative embodiment of the present invention, and any equivalent changes and modifications made by those skilled in the art without departing from the spirit and principle of the present invention should fall within the protection scope of the present invention.

Claims (9)

1. A method for person-centric regionalized multi-dimensional health data processing, the method comprising:
according to the integrity rule, carrying out deduplication processing on health data acquired from different data domains to obtain pure health data;
carrying out standardization processing on the pure health data to obtain unified description health data;
merging the unified description health data into data storage distinguished by data fields to obtain a personal information table, wherein the personal information table comprises a main identification data strip and a non-main identification data strip;
clustering the main identification data strips in different data warehouses to obtain main clustering data strips;
calculating a first weight of each piece of data in the main clustering data strip according to a multi-domain personal data normalization weight analysis formula;
normalizing the main clustering data strips according to the first weight and a first threshold to obtain main personal data strips;
generating a master index number for the master personal data strip;
storing the main personal data strip in a global main index table according to the main index number;
clustering the non-main identification data strips in different data warehouses to obtain non-main clustering data strips;
calculating a second weight of each piece of data in the non-main clustering data strip according to a multi-domain personal data normalization weight analysis formula;
selecting the optimal non-main identification data in each data warehouse according to the second weight; and
and storing the optimal non-main identification data into the global main index table.
2. The method for processing personal-centric localized multi-dimensional health data according to claim 1, wherein the step of storing the optimal non-master identification data into the global master index table specifically comprises:
matching the optimal non-main identification data with the universe main index table;
if the matching is successful, storing the optimal non-main identification data into the global main index table according to the main index number;
and if the matching fails, generating a new main index number, and storing the optimal non-main identification data into the global main index table according to the new main index number.
3. The method of claim 1, wherein after the step of storing the optimal non-master identification data into the global master index table, the method further comprises:
and performing cross indexing on the personal information table and the universe main index table according to the main index number and the intra-domain number.
4. The method of personal-centric regionalized multi-dimensional health data processing according to claim 3, wherein the intra-domain numbering comprises at least one of a person number, a clinic number, a will number, a specimen number, a statement of settlement number, and a resident health profile number.
5. The method of claim 1, wherein prior to the step of de-duplicating health data collected from different data fields according to integrity rules to obtain clean health data, the method further comprises:
health data is periodically collected in batches from different data fields by a private network line by using a button tool.
6. The personal-centric regionalized multidimensional health data processing method according to claim 1, wherein the step of normalizing the clean health data to obtain unified descriptive health data specifically comprises:
analyzing the massive health data of each data field to obtain a data identification rule of each data field; and
and carrying out normalized processing on the specific field data of the pure health data according to the data identification rule.
7. The personal-centric regionalized multidimensional health data processing method according to claim 1, wherein the multi-domain personal data normalization weight analysis formula specifically is:
Figure FDA0002809187370000031
wherein omegaiThe first weight of each piece of data in the main clustering data strip is obtained; w is aiWeights for data quality in different data domains; f. ofi,jData of a specific field in a piece of data in a main cluster data strip andthe number of times of successful matching of the specific field data of other data strips in the main clustering data strips; n is the number of data in the main clustering data strip; u. ofjWeights for particular field data in different data fields; i is a data field number, wherein the data field comprises a medical insurance field, a hospital field, a resident health record field, a pharmacy field, a health wearable device field and a personal use behavior field.
8. The method of claim 1, wherein each piece of data in the master identification data strip includes a unique number, the unique number including at least one of an identification number, a military officer license number, a passport number, and a medicare card number; each piece of data in the non-main identification data strip does not comprise a unique number.
9. A computer storage medium containing computer executable instructions which, when processed by a data processing apparatus, perform the method of personal-centric regionalized multidimensional health data processing according to any one of claims 1 to 8.
CN201811203501.4A 2018-10-16 2018-10-16 Individual-centered regionalized multi-dimensional health data processing method and medium Active CN109522331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811203501.4A CN109522331B (en) 2018-10-16 2018-10-16 Individual-centered regionalized multi-dimensional health data processing method and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811203501.4A CN109522331B (en) 2018-10-16 2018-10-16 Individual-centered regionalized multi-dimensional health data processing method and medium

Publications (2)

Publication Number Publication Date
CN109522331A CN109522331A (en) 2019-03-26
CN109522331B true CN109522331B (en) 2021-04-16

Family

ID=65770882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811203501.4A Active CN109522331B (en) 2018-10-16 2018-10-16 Individual-centered regionalized multi-dimensional health data processing method and medium

Country Status (1)

Country Link
CN (1) CN109522331B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694993B (en) * 2020-06-11 2023-05-02 北京金山云网络技术有限公司 Method, device, electronic equipment and medium for creating data index
CN113779022B (en) * 2021-02-07 2025-02-21 北京沃东天骏信息技术有限公司 Data backtracking output method and device, electronic device, and storage medium
CN113836141B (en) * 2021-09-24 2022-04-19 中国劳动关系学院 Big data cross indexing method based on distribution model
CN119943445A (en) * 2025-01-17 2025-05-06 中国人民解放军总医院第二医学中心 A comprehensive management system for medical care information

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324894A (en) * 2008-07-24 2008-12-17 中国网络通信集团公司 Method and system for associating medical global identification and medical local identification
CN102005023A (en) * 2010-10-26 2011-04-06 汪海玥 National health medical file system managed by means of internet website
CN103870668A (en) * 2012-12-17 2014-06-18 上海联影医疗科技有限公司 Method and device for establishing master patient index oriented to regional medical treatment
CN104063567A (en) * 2013-03-20 2014-09-24 上海联影医疗科技有限公司 Establishment method of patient identity source cross reference
CN105574334A (en) * 2015-12-15 2016-05-11 深圳安泰创新科技股份有限公司 Medical information processing method and system
CN105678100A (en) * 2016-03-01 2016-06-15 万达信息股份有限公司 Health record browsing system
CN105787010A (en) * 2016-02-23 2016-07-20 北京凯行同创科技有限公司 Acquisition processing and pushing method and system based on personal data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324894A (en) * 2008-07-24 2008-12-17 中国网络通信集团公司 Method and system for associating medical global identification and medical local identification
CN102005023A (en) * 2010-10-26 2011-04-06 汪海玥 National health medical file system managed by means of internet website
CN103870668A (en) * 2012-12-17 2014-06-18 上海联影医疗科技有限公司 Method and device for establishing master patient index oriented to regional medical treatment
CN104063567A (en) * 2013-03-20 2014-09-24 上海联影医疗科技有限公司 Establishment method of patient identity source cross reference
CN105574334A (en) * 2015-12-15 2016-05-11 深圳安泰创新科技股份有限公司 Medical information processing method and system
CN105787010A (en) * 2016-02-23 2016-07-20 北京凯行同创科技有限公司 Acquisition processing and pushing method and system based on personal data
CN105678100A (en) * 2016-03-01 2016-06-15 万达信息股份有限公司 Health record browsing system

Also Published As

Publication number Publication date
CN109522331A (en) 2019-03-26

Similar Documents

Publication Publication Date Title
US9165116B2 (en) Patient data mining
Janjua et al. Assessing hepatitis C burden and treatment effectiveness through the British Columbia Hepatitis Testers Cohort (BC-HTC): design and characteristics of linked and unlinked participants
CN109522331B (en) Individual-centered regionalized multi-dimensional health data processing method and medium
US10319466B2 (en) Intelligent filtering of health-related information
US20200013491A1 (en) Interoperable Record Matching Process
US20200020423A1 (en) A method and system for matching subjects to clinical trials
WO2017182509A1 (en) Hospital matching of de-identified healthcare databases without obvious quasi-identifiers
CA2937454A1 (en) Dynamic document matching and merging
CN110752027B (en) Electronic medical record data push method, device, computer equipment and storage medium
CA2939463A1 (en) Systems and methods for biomedical research database development and uses thereof
US20150339602A1 (en) System and method for modeling health care costs
CN115171830A (en) Patient data-based service package generation method, device, equipment and storage medium
US20200066380A1 (en) Identification of an appropriate medical institution based on patient information including a symptom and a medical history
US12265448B2 (en) Apparatus and method for data fault detection and repair
Ageno et al. Acquisition of temporal patterns from electronic health records: an application to multimorbid patients
CN109997201A (en) For the accurate clinical decision support using data-driven method of plurality of medical knowledge module
CN111986815B (en) Project combination mining method based on co-occurrence relation and related equipment
Barboi et al. Client registries: identifying and linking patients
CN108630287A (en) Data integration method
Ramamoorthy et al. Tweet topics on cancer among Indian Twitter users—computational approach using latent Dirichlet allocation topic modelling
Tian et al. Facilitating cancer epidemiologic efforts in Cleveland via creation of longitudinal de-duplicated patient data sets
Toh et al. SA68 Assessment of Real-World Data Sources and a Hybrid Approach in Real-World Evidence Generation Using Unharmonized Data Sources
Zhang et al. SA66 Patient Demographic and Clinical Characteristics Associated With the Use of the Newly Approved Disease-Modifying Medications for Sickle Cell Disease
Miles et al. SA69 Enhancing the Value and Usefulness of Data Extracted From an Economic Systematic Literature Review
Marlin et al. SA67 Demographic and Clinical Characteristics of Patients Who Delayed, Skipped, or Continued Care During the First Year of the COVID-19 Pandemic and the Study Design Implications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant