[go: up one dir, main page]

CN114969041A - Processing method for multi-source main and subsidiary entity identity discrimination and data self-complementing - Google Patents

Processing method for multi-source main and subsidiary entity identity discrimination and data self-complementing Download PDF

Info

Publication number
CN114969041A
CN114969041A CN202210592302.7A CN202210592302A CN114969041A CN 114969041 A CN114969041 A CN 114969041A CN 202210592302 A CN202210592302 A CN 202210592302A CN 114969041 A CN114969041 A CN 114969041A
Authority
CN
China
Prior art keywords
entity
information
data
database
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210592302.7A
Other languages
Chinese (zh)
Other versions
CN114969041B (en
Inventor
吴峰
张朝宗
李银生
王红
聂永川
任雁
毋鹏杰
杨扬
刘淼
张义倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Academy Of Science And Technology Information Hebei Academy Of Science And Technology Innovation Strategy
Original Assignee
Hebei Academy Of Science And Technology Information Hebei Academy Of Science And Technology Innovation Strategy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Academy Of Science And Technology Information Hebei Academy Of Science And Technology Innovation Strategy filed Critical Hebei Academy Of Science And Technology Information Hebei Academy Of Science And Technology Innovation Strategy
Priority to CN202210592302.7A priority Critical patent/CN114969041B/en
Publication of CN114969041A publication Critical patent/CN114969041A/en
Priority to ZA2022/11776A priority patent/ZA202211776B/en
Application granted granted Critical
Publication of CN114969041B publication Critical patent/CN114969041B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a processing method for multisource main and auxiliary entity identity discrimination and data self-complementing, which is applied to the field of big data processing. According to the invention, through technical methods of identity probability calculation of a main entity and an auxiliary entity, index supplement and data merging of the same entity, entity directory item extraction and storage, entity sub-directory item separation and the like, the problems of processing and grouping of the main entity and the auxiliary entity respectively according to the identity probability, cross-source entity merging and data supplement, entity relationship unified storage, entity on-demand separation and the like are solved systematically, and a feasible solution is provided for multi-source and large-scale data association operation.

Description

Processing method for multi-source main and subsidiary entity identity discrimination and data self-complementing
Technical Field
The invention relates to the technical field of big data application, in particular to a processing method for multi-source main and subsidiary entity identity discrimination and data self-complementing.
Background
The existing entity identification, extraction and storage method for processing multi-source data generally includes collecting according to source or type, matching and identifying one by one according to entity attributes of data, and due to lack of distinguishing mechanisms such as entity bibliographic items, same scene, entity attribute classification and weight, data redundancy, non-uniform expression, low matching accuracy, low execution efficiency, identification process information loss and the like are caused, and the method is mainly embodied in the following aspects:
1) data redundancy and non-uniform expression. In the prior art, when entities of heterogeneous data are collected, collection according to sources or types is generally adopted, and indexes of the collected entity data are often inconsistent due to various indexes of the entities represented in the data, so that unified storage, standard expression and external service supply cannot be realized.
2) The entity matching accuracy is not high. The existing identification technology for entities generally carries out matching and identification according to entity attributes of data, and due to the restriction of factors such as various entity attributes and large data quantity, the problems of low matching degree, low precision and the like generally exist.
3) Entity identification is not efficiently performed. In the prior art, entities are usually judged in sequence according to the attribute sequence of the entities, and the problems of long entity identification and calculation time, inconsistent attribute sequence and the like are often caused due to the lack of classification definition, weight assignment and the like aiming at the attributes of the entities.
4) The entity is relatively static and the data quality cannot be improved. In the prior art, when an entity is identified and extracted, a direct separation mode is generally adopted, the attribute expansion is limited, mutual correction, supplement and expansion of data are not or rarely carried out according to implicit attributes among the data, the data self-perfection cannot be realized, and the data quality cannot be effectively ensured.
5) Identifying a process information loss. In the prior art, when an entity is identified, only the attribute information of the same entity which is successfully identified is usually recorded, and a large probability event in the process of identifying the entity is rarely recorded, for example, the situation that two entities are identified as the same entity with a large probability but cannot be completely identified as the same entity is judged, which is not favorable for deep mining and analysis of data relationship.
Disclosure of Invention
The invention provides a processing method for multisource principal and subordinate entity identity discrimination and data self-supplementation, which is used for solving the problems of principal and subordinate entity identity discrimination, automatic data merging and supplementation and the like of multisource and multistage data and provides a feasible solution for carrying out multisource and large-scale data correlation operation.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows.
A processing method for multi-source main and subsidiary entity identity discrimination and data self-complementing specifically comprises the following steps:
A1. extracting a main entity bibliography MEFS and an accessory entity bibliography SEFS from an entity bibliography database EFDB of a source A, extracting an application scene ES between a main entity M (M) and an accessory entity S (M) from an entity application scene database ESDB of the source A, extracting entity static database related entity information from an entity static database RSDB, extracting information representing a single-source same entity according to the main entity and the same scene information by using a single-source same entity screening and data supplementing device, storing the information into a same entity database SEDB, and performing data supplementation;
A2. extracting entity static library related entity information from an entity static database RSDB, extracting an accessory entity entry SEFS from an entity entry database EFDB of a source B, extracting an application scene ES between a main entity M (M) and an accessory entity S (M) from an entity application scene database ESDB of the source B, extracting dynamic library entity data information from an entity dynamic database RVDB, extracting same entity data information from a same entity database SEDB, judging the identity of heterogeneous entities according to rules by utilizing a heterogeneous same entity discriminator, extracting information representing the heterogeneous same entity, transmitting the information to a heterogeneous entity data supplifier, and simultaneously storing the information into a main entity dynamic database RVDB;
A3. extracting dynamic database entity data information from an entity dynamic database RVDB, extracting same entity data information from a same entity database SEDB, receiving information of a same source and a same entity from a heterogeneous same entity discriminator, supplementing the information of the heterogeneous entity by using a heterogeneous entity data supplementing device according to a time nearest principle, and storing the information of the heterogeneous entity supplementation into the entity dynamic database RVDB;
A4. extracting the same entity data information from the same entity database SEDB, extracting the entity data information of the dynamic database from the entity dynamic database RVDB, utilizing an entity directory item automatic extraction generator, extracting entity directory ELS information according to an entity directory essential item ELES, and storing the entity directory ELS information into an entity directory database EDDB;
A5. extracting dynamic database entity data information from an entity dynamic database RVDB, extracting entity directory information from an entity directory database EDDB, automatically separating sub-entity information from the entity directory database EDDB by utilizing a sub-entity automatic separator according to rules to form sub-entity directory information, and storing the sub-entity directory information into the entity directory database EDDB.
In the above processing method for the identification of identity of multiple main and auxiliary entities and the data self-complementing, the working method of the single-source identification of the same entity and the data complementing device in step a1 is as follows:
A11. reading a single-source multi-library data set DSB from an entity static library database RSDB of a source A;
A12. reading the number N1 of unwarehoused libraries from an entity bibliographic item database EFDB of a source A, and setting N1 as 1;
A13. reading a main entity bibliographic item MEFS of a library n1, obtaining a data set DSA of the main entity bibliographic item MEFS, simultaneously obtaining the number of records I1 of the data set DSA, and setting I1 as 1;
A14. reading the i1 th record in the DSA, matching the data in the DSB by using the unique item K of the entry data, if the matching is successful, executing the step A15, and if the matching is unsuccessful, executing the step A19;
A15. extracting the related information representing the identity of the single-source entity of the main entity m1 corresponding to the record i1, and writing the related information into an identical entity database SEDB;
A16. reading a related information data set DSC of the master entity m1 in the source A, which characterizes the same entity, from the same entity database SEDB;
A17. reading an affiliated entity information set DSS corresponding to a main entity m1 from an entity application scene database ESDB, and judging whether a specific affiliated entity s has the same entity or not by using the same scene SS rule; if the same entity exists, performing step A18, otherwise, performing step A19;
A18. extracting the related information of the same entity of a specific affiliated entity s, and writing the related information into a SEDB of the same entity database;
A19. judging whether I1> I1 is true, if so, executing I1 to I1+1, and jumping to the step A14 for execution; otherwise, jumping to the step A110 for execution;
A110. judging whether N1> N1 is true, if so, executing N1 as N1+1, and jumping to the step A13 for execution; otherwise, ending.
In the above processing method for the identification of the identity of the multisource main and auxiliary entities and the data self-complementing, the specific method for the identification of the identity of the heterogeneous entities by the heterogeneous identity discriminator in the step a2 is as follows:
A21. reading the number N2 of the auxiliary entity types which are not put in a storage from an entity bibliography item database EFDB of a source B according to the entity types, and setting N2 as 1;
A22. reading the related information of the specific affiliated entity type n2, and simultaneously obtaining the warehousing threshold TH of the affiliated entity type n2 set by the system;
A23. judging whether a corresponding entity dynamic database RVDB exists or not according to the affiliated entity type n2, if so, executing a step A24, and if not, jumping to the step A214 for execution;
A24. reading a relevant information data set DSF representing the same affiliated entity type n2 from the same entity database SEDB according to the affiliated entity type n 2;
A25. reading a dynamic library information data set DSD from an entity dynamic library RVDB;
A26. reading a set DSG of an affiliated entity type n2 from an affiliated entity bibliography database EFDB of a source B to obtain a record number M2, and setting M2 to be 1;
A27. reading m2 records of the subject entity bibliographic items from the set DSG;
A28. reading a specific application scenario es between the affiliated entity corresponding to the record m2 and the main entity from an entity application scenario database (ESDB) of the source B according to the affiliated entity type n2 and the record m 2;
A29. reading a specific static database data set DSE corresponding to the record m2 from an entity static database RSDB of the source B according to the affiliated entity type n2 and the record m 2;
A210. acquiring set DSF information from step A24, acquiring set DSD information from step A25, acquiring record m2 information from step A27, acquiring application scenario es information from step A28, acquiring set DSE information from step A29, matching in the set DSD according to a set rule by using unique item, invariant item and common item attribute of record m2 of the subject entity bibliography item, and the application scenario es, set DSD, set DSE and set DSF information, and calculating a similarity probability P (A) between entities;
A211. judging whether P (A) > TH is true, if not, jumping to the step A213 to execute, if so, writing P (A) and the information representing the entity item into the same entity database SEDB;
A212. judging whether P (A) is true or not, if not, jumping to the step A213 for execution, if so, transmitting the information of the record m2, the specific record item d corresponding to the set DSD, the specific record item e corresponding to the set DSE and the specific record item f corresponding to the set DSF into a heterologous entity data supplyer, and starting the operation of the heterologous entity data supplyer;
A213. judging whether M2> M2 is true, if true, executing M2-M2 +1, and simultaneously jumping to the step A26 to execute; if not, perform step A214;
A214. judging whether N2> N2 is true, if true, executing N2-N2 +1, and simultaneously jumping to the step A22 to execute; if not, it ends.
In the above processing method for identity discrimination and data self-complementing of multi-source main and subsidiary entities, the specific method for information supplementation of the heterogeneous entity in step a3 is as follows:
A31. receiving information of a record m2, a specific record item d corresponding to the set DSD, a specific record item e corresponding to the set DSE and a specific record item f corresponding to the set DSF;
A32. aiming at the unique item, the invariable item and the common item attribute of a specific bibliographic item, obtaining the number N3 of attributes, and setting N3 as 1;
A33. obtaining the attribute name of the n3 th attribute;
A34. reading the corresponding data dn of the record item d according to the attribute name, and simultaneously, sequentially reading the corresponding data of the record m2, the record item e and the record item f, and comparing the corresponding data with the dn;
A35. judging whether dn is empty, if so, jumping to the step A36 for execution, and if not, switching to the step A37 for execution;
A36. supplementing corresponding latest data in the record m2, the record item e and the record item f into dn according to a time latest principle, and recording a time stamp and source information of the supplemented data;
A37. marking the time stamp and the source information of the corresponding attribute data in the record m2, the record item e and the record item f;
A38. forming a temporary record item d', judging whether N3> N3 is true, if so, jumping to the step A33 for execution, otherwise, executing the step A39;
A39. for other attributes except the unique item, the invariable item and the common item, reading corresponding attribute data in the record item m2, the record item e and the record item f in sequence, and comparing the attribute data with the record item d;
A310. recording the time stamp and the source information to form a latest temporary record item; updated into the entity dynamic database RVDB.
In the above processing method for multi-source identity discrimination and data self-complementing of main and auxiliary entities, the method for generating the entity directory information in step a4 includes:
A41. setting entity types according to a system, obtaining the number N4 of the entity types, and setting N4 as 1;
A42. reading an entity directory entry els and an entity directory essential entry eles of the entity n 4;
A43. reading the same entity data set DSH of 100% entity n from the same entity database SEDB;
A44. according to a set DSH, extracting relevant data information of an entity directory entry els of an entity n4 from an entity dynamic database according to a latest time principle to form a temporary data set DSI;
A45. according to a data non-null principle of a necessary item eles of an entity directory of an entity n4, filtering a set DSI to form a data subset DSJ;
A46. writing the set DSJ into an entity directory database EDDB as entity directory ELS information of an entity n 4;
A47. and judging whether N4> N4 is true, if so, making N4 equal to N4+1, and jumping to step A42 to execute, otherwise, ending.
In the above processing method for multi-source identity discrimination and data self-complementing of main and subsidiary entities, the automatic separation method of the entity directory information in step a5 includes:
A51. according to the user instruction, starting a sub-entity separation program of a specific entity n 5;
A52. reading an entity separation rule r specified or preset by a user;
A53. reading a directory data set DSO of a specific entity n5 from an entity directory database EDDB, and setting a temporary data set DSP;
A54. obtaining the number I5 of records in the set DSO, and setting I5 as 1;
A55. reading record n5 in the set DSO, reading corresponding dynamic library entity data information in the entity dynamic database RVDB according to the information of record n5, matching, if matching is successful, executing step A56, otherwise, executing step A57;
A56. adding the record n5 into a data set DSP;
A57. judging whether I5> I5 is true, if so, executing I5 to I5+1, jumping to step A55 for execution, and if not, executing step A58;
A58. and writing the data set DSP into an entity directory database EDDB.
Due to the adoption of the technical scheme, the technical progress of the invention is as follows.
According to the invention, through technical methods such as the calculation of the probability of identity between a main entity and an auxiliary entity, the index supplement and data combination of the same entity, the extraction and storage of entity directory items, the separation of entity directory sub-items and the like, the problems of the main entity and the auxiliary entity that the main entity processes and integrates according to the probability of identity, the cross-source entity combination and data supplement, the unified storage of entity relations, the separation of entities according to needs and the like are systematically solved, and a feasible solution is provided for the multi-source and large-scale data association operation.
Mainly has the following remarkable effects.
1) The data is regular and the expression is uniform. Because the invention provides the identification, extraction and storage according to the entity entry, and the secondary processing and extraction of data according to the entity entry, compared with the prior art, the indexes are standardized and unified, the data can be regularly and uniformly stored, the entity expression is more uniform, and the use is more flexible.
2) The entity matching accuracy and the execution efficiency are improved. The invention classifies the concrete attributes of the entities, gives different weights, combines information such as contract scenes and the like, and performs entity matching and extraction, compared with the prior art, the matching difficulty is smaller, and the matching precision is higher; the calculation attribute is less, and the execution efficiency is higher; the problems of front-back contradiction, inconsistency and the like of the attribute values can be effectively relieved.
3) The data quality is improved. In the process of extracting and storing the entity data, the invention realizes the self-perfection and correction of the entity data by extracting and identifying the hidden attribute.
4) The same entity probabilities are recorded. The invention respectively stores and processes according to the same entity probability in the identification process, and compared with the prior art, the accuracy of data fusion is improved; the difficulty of secondary entity identification is reduced; the method is beneficial to deep mining and data analysis of different scene applications and entity relations.
Drawings
FIG. 1 is a schematic structural view of the present invention;
FIG. 2 is a flow chart of the present invention;
fig. 3 is a schematic diagram of a working flow of the single-source identical entity screening and data supplementing apparatus according to the present invention;
FIG. 4 is a schematic view of the working flow of the same and different entity discriminator according to the invention;
FIG. 5 is a schematic flow chart of the operation of the data augmenter of the heterogeneous entity according to the present invention;
FIG. 6 is a schematic diagram of the work flow of the automatic entity directory entry extraction generator according to the present invention;
FIG. 7 is a schematic diagram of the working process of the automatic fruit body separator according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
A processing method for multisource main and auxiliary entity identity screening and data self-complementing is applied to the field of big data processing, and provides a technical scheme for stripping multisource data entities according to main and auxiliary entities, screening the same entities according to the same scene, entity attribute classification, weight and the like, respectively processing and storing the screening probability, and providing feasibility for different scene applications of data, deep mining of entity relations and data analysis.
In actual operation, firstly, extracting information representing a single-source same entity; then, judging the information of different sources and the same entity, and performing data supplement and expansion; and finally forming an entity directory item and an entity sub-directory item.
In the present invention, the following database is applied: 1) an entity static database RSDB (relative static database) for storing data of multiple databases from the same source (single source); 2) an entity dynamic database RVDB (relative variety database) for storing indexes and data of entities from different sources after integration; 3) an entity entry database EFDB (EntityFeatureDatabase) for storing information such as a main entity entry MEFS and related data, an auxiliary entity entry SEFS and related data and the like; 4) an entity application scene database (ESDB) (EntitySenseDatabase) for storing the application scene ES between the main entity M (M) and the subordinate entity S (M).
In the present invention, the terminology used includes: 1) a Source (Source) S, which is used to describe a set of data sets of a particular subject, with stability and continuity over a period of time; 2) a library (Data-Set) DS, which refers to a Set of Data sets generated by a source for a certain period of time, and may be composed of one or more two-dimensional Data tables; 3) table (Table) T, which refers to a two-dimensional data Table in the library; 4) an Entity (Entity), which can be a research object with relative stability and uniqueness described by a group of characteristic variables, is divided into a main Entity and an auxiliary Entity according to the mutual dependent relationship among different entities; 5) a main entity (MainEntity) refers to a research entity described by all or most of attributes in a source, generally, only one main entity in one source is represented in an entity (main entity corresponding to the entity) format, and the main entity is represented as m (m); 6) subordinate Entity (subordinate Entity) refers to an Entity dependent on the main Entity in the source, and usually, the subordinate Entity is a part of the main Entity or a set of variables for describing the attributes of the main Entity, and is represented in the format of "Entity (main Entity corresponding to Entity)", and the subordinate Entity is represented as s (m); 7) entity entry EFS (EntityFeatureStructure: entity feature structures) that reflect a set of index sets of entity attributes; 8) the main entity bibliography item MEFS (MainEntity FeatureStructure: primary entity feature structure) indicating a set of index sets that reflect attributes of the primary entity; 9) affiliated entity bibliography item SEFS (subsidiary entityfeaturestructure: secondary physical feature structure): the index set can reflect the affiliated entity and the incidence relation between the affiliated entity and the main entity, not only can reflect the self attribute of the affiliated entity, but also can reflect the related attribute of the state of the main entity where the affiliated entity is located; 10) in the same scenario ss (samesense), when an entity is stripped, for an affiliated entity in the same source, the same scenario is obtained when the indexes are consistent and the corresponding specific main entities are consistent.
In order to identify entity identity, dividing the attribute of an entity bibliography item into a unique item, an invariant item and a common item, wherein: the unique item k (key) refers to an attribute that can characterize the uniqueness of an entity, such as: identity card number, unified social credit code, organizational code, etc.; the invariant term uc (Unchange) refers to an attribute that an entity typically does not change often or never, such as: names, sexes, etc. of the person entities, unit names, addresses, etc. of the organization entities; the common term N (normal) refers to the attribute of the entity except for the unique term K and the invariant term UC.
In order to provide services for the external application entity and extract entity directory entries, the entity directory entries and the entity directory entries are used as necessary entries, wherein: the entity directory item els (entityliststructure) refers to a limited set of attributes that can reflect the basic status of an entity, selected according to a specific application, for example: for an "organization" entity, the basic items can be set as "organization name", "unified social credit code", "address", etc.; the entity directory essential item eles (entitylistessesententialsstructure) refers to a limited set of attributes selected according to a specific application that can guarantee the entity directory to be meaningful, typically name class attributes, the absence of which can render a specific entity meaningless, for example: the "organization name" of the "organization" entity, the "name" of the "personnel" entity, etc.
In the invention, after entity identification, extraction and processing, the heterogeneous data are respectively stored in the following two databases: an entity directory database EDDB (EntityDirectoryDatabase) stores entity directory information of different sources for providing services to the outside; the same entity database sedb (sameentitydatabase) stores information characterizing the same entity.
The implementation of the invention depends on a plurality of modules, as shown in fig. 1, including a single-source same-entity screening and data supplementing device, a heterogeneous same-entity screening device, a heterogeneous entity data supplementing device, an entity entry automatic extraction generator, and an automatic entity segregator.
A processing method for multi-source main and subsidiary entity identity discrimination and data self-complementing is disclosed, the flow of which is shown in figure 2, and the method specifically comprises the following steps.
A1. Extracting a main entity bibliography MEFS and an accessory entity bibliography SEFS from an entity bibliography database EFDB of a source A, extracting an application scene ES between a main entity M (M) and an accessory entity S (M) from an entity application scene database ESDB of the source A, extracting entity static library related entity information from an entity static database RSDB, extracting information representing a single-source same entity according to the main entity, the same scene and the like by utilizing a single-source same entity screening and data supplementing device, storing the information into a same entity database SEDB, and performing data supplementation.
In this step, the working method of the single-source same-entity screening and data supplementing device is as shown in fig. 3, which is specifically as follows.
A11. Reading a single-source multi-library data set DSB from an entity static library database RSDB of a source A;
A12. reading the number N1 of unwarehoused libraries from an entity bibliographic item database EFDB of a source A, and setting N1 as 1;
A13. reading a main entity bibliographic item MEFS of a library n1, obtaining a data set DSA of the main entity bibliographic item MEFS, simultaneously obtaining the number of records I1 of the data set DSA, and setting I1 as 1;
A14. reading the i1 th record in the DSA, matching the data in the DSB by using the unique item K of the entry data, if the matching is successful, executing the step A15, and if the matching is unsuccessful, executing the step A19;
A15. extracting the related information representing the identity of the single-source entity of the main entity m1 corresponding to the record i1, and writing the related information into an identical entity database SEDB;
A16. reading a related information data set DSC of the master entity m1 in the source A, which characterizes the same entity, from the same entity database SEDB;
A17. reading an affiliated entity information set DSS corresponding to a main entity m1 from an entity application scene database ESDB, and judging whether a specific affiliated entity s has the same entity or not by using the same scene SS rule; if the same entity exists, performing step A18, otherwise, performing step A19;
A18. extracting the same entity related information of the specific affiliated entity s, and writing the same entity related information into a same entity database SEDB;
A19. judging whether I1> I1 is true, if so, executing I1 to I1+1, and jumping to the step A14 for execution; otherwise, jumping to the step A110 for execution;
A110. judging whether N1> N1 is true, if true, executing N1 to N1+1, and jumping to execute the step A13; otherwise, ending.
A2. Extracting entity static library related entity information from an entity static database RSDB, extracting an accessory entity entry SEFS from an entity entry database EFDB of a source B, extracting an application scene ES between a main entity M (M) and an accessory entity S (M) from an entity application scene database ESDB of the source B, extracting dynamic library entity data information from an entity dynamic database RVDB, extracting same entity data information from a same entity database SEDB, judging the identity of heterogeneous entities by using a heterogeneous and same entity discriminator according to rules, extracting information representing the heterogeneous and same entities, transmitting the information to a heterogeneous entity data supplifier, and simultaneously storing the information into a main entity dynamic database RVDB.
In this step, the process of discriminating the identity of the heterogeneous entity by the heterogeneous entity discriminator is shown in fig. 4, and the specific method is as follows.
A21. Reading the number N2 of the accessory entity types which are not put in a storage from an entity entry database EFDB of a source B according to the entity types, and setting N to be 1;
A22. reading the relevant information of the specific affiliated entity type n2, and simultaneously obtaining the warehousing threshold TH of the affiliated entity type n2 set by the system;
A23. judging whether the corresponding entity dynamic database RVDB exists or not according to the affiliated entity type n2, if so, executing the step A24, and if not, jumping to the step A214 for execution;
A24. reading a relevant information data set DSF representing the same affiliated entity type n2 from the same entity database SEDB according to the affiliated entity type n 2;
A25. reading a dynamic library information data set DSD from an entity dynamic library RVDB;
A26. reading a set DSG of an affiliated entity type n from an affiliated entity entry database EFDB of a source B to obtain the number M of records, wherein M is 1;
A27. reading m2 records of the subject entity bibliographic items from the set DSG;
A28. reading a specific application scenario es between the affiliated entity corresponding to the record m2 and the main entity from an entity application scenario database (ESDB) of the source B according to the affiliated entity type n2 and the record m 2;
A29. reading a specific static database data set DSE corresponding to the record m2 from an entity static database RSDB of the source B according to the affiliated entity type n2 and the record m 2;
A210. acquiring set DSF information from step A24, acquiring set DSD information from step A25, acquiring record m information from step A27, acquiring application scenario es information from step A28, acquiring set DSE information from step A29, matching in the set DSD according to set rules by using unique items, invariable items and common item attributes of the record m2 of the subject entry of the affiliated entity and the application scenario es, set DSD, set DSE and set DSF information, and calculating the similarity probability P (A) among the entities;
in this embodiment: when the personnel entities are matched, aiming at the information of two personnel, if the identity card numbers are the same, P (A) is 100 percent; if the name and the mobile phone number are the same, P (A) is 100%; if the name and unit are the same, P (A) is 80%, etc.
A211. Judging whether P (A) > TH is true, if not, jumping to the step A213 to execute, if so, writing P (A) and the information representing the entity item into the same entity database SEDB;
A212. judging whether P (A) is true or not, if not, jumping to the step A213 for execution, if so, transmitting the information of the record m2, the specific record item d corresponding to the set DSD, the specific record item e corresponding to the set DSE and the specific record item f corresponding to the set DSF into a heterologous entity data supplyer, and starting the operation of the heterologous entity data supplyer;
A213. judging whether M2> M2 is true, if true, executing M2-M2 +1, and simultaneously jumping to the step A26 to execute; if not, perform step A214;
A214. judging whether N2> N2 is true, if true, executing N2-N2 +1, and simultaneously jumping to the step A22 to execute; if not, the process is ended.
A3. The method comprises the steps of extracting dynamic database entity data information from an entity dynamic database RVDB, extracting same entity data information from the same entity database SEDB, receiving information of the same source and the same entity from a heterogeneous same entity discriminator, supplementing the information of the heterogeneous entities by using a heterogeneous entity data supplementing device according to the time recency principle and the like, and storing the information of the heterogeneous entity supplementation into the entity dynamic database RVDB.
In this step, the flow of the information supplementation of the heterologous entity is shown in fig. 5, and the specific method is as follows.
A31. Receiving information of a record m2, a specific record item d corresponding to the set DSD, a specific record item e corresponding to the set DSE and a specific record item f corresponding to the set DSF;
A32. aiming at the unique item, the invariable item and the common item attribute of a specific bibliographic item, obtaining the number N3 of attributes, and setting N3 as 1;
A33. obtaining the attribute name of the n3 th attribute;
A34. reading the corresponding data dn of the record item d according to the attribute name, and simultaneously, sequentially reading the corresponding data of the record m2, the record item e and the record item f, and comparing the corresponding data with the dn;
A35. judging whether dn is empty, if so, jumping to the step A36 for execution, and if not, switching to the step A37 for execution;
A36. supplementing corresponding latest data in the record m2, the record item e and the record item f into dn according to a time latest principle, and recording a time stamp and source information of the supplemented data;
A37. marking the time stamp and the source information of the corresponding attribute data in the record m2, the record item e and the record item f;
A38. forming a temporary recording item d', judging whether N3> N3 is true, if true, jumping to the step A33 for execution, otherwise, executing the step A39;
A39. for other attributes except the unique item, the invariable item and the common item, reading corresponding attribute data in the record m2, the record item e and the record item f in sequence, and comparing the attribute data with the record item d;
A310. recording the time stamp and the source information to form a latest temporary record item; updated into the entity dynamic database RVDB.
A4. Extracting the same entity data information from the same entity database SEDB, extracting the dynamic database entity data information from the entity dynamic database RVDB, utilizing an entity directory item automatic extraction generator, extracting entity directory ELS information according to an entity directory essential item ELES, and storing the entity directory ELS information into the entity directory database EDDB.
In this step, a specific flow of the entity directory information is shown in fig. 6, and a generation method thereof is as follows.
A41. Setting entity types according to a system, obtaining the number N4 of the entity types, and setting N4 as 1;
A42. reading an entity directory entry els and an entity directory essential entry eles of the entity n 4;
A43. reading from the same entity database SEDB the same entity data set DSH of entity n for which p (a) ═ 100%;
A44. according to the set DSH, extracting the related data information of the entity directory entry els of the entity n from the entity dynamic database according to the latest time principle to form a temporary data set DSI;
A45. according to a data non-null principle of an entity directory essential item eles of the entity n, filtering a set DSI to form a data subset DSJ;
A46. writing the set DSJ into an entity directory database EDDB as entity directory ELS information of an entity n 4;
A47. and judging whether N4> N4 is true, if so, making N4 equal to N4+1, and jumping to step A42 to execute, otherwise, ending.
A5. Extracting dynamic database entity data information from an entity dynamic database RVDB, extracting entity directory information from an entity directory database EDDB, automatically separating sub-entity information from the entity directory database EDDB by utilizing a sub-entity automatic separator according to rules to form sub-entity directory information, and storing the sub-entity directory information into the entity directory database EDDB.
In this step, the automatic separation method of the information of the fruit body directory is as shown in fig. 7, specifically as follows.
A51. Starting a fruit body separation program of a specific entity n according to a user instruction;
A52. reading an entity separation rule r specified or preset by a user;
A53. reading a directory data set DSO of a specific entity n from an entity directory database EDDB, and setting a temporary data set DSP;
A54. obtaining the number I5 of records in the set DSO, and setting 5I to 1;
A55. reading record n5 in the set DSO, reading corresponding dynamic library entity data information in the entity dynamic database RVDB according to the information of record n5, matching, if matching is successful, executing step A56, otherwise, executing step A57;
A56. adding the record n5 into a data set DSP;
A57. judging whether I5> I5 is true, if so, executing I5 to I5+1, jumping to step A55 for execution, and if not, executing step A58;
A58. and writing the data set DSP into an entity directory database EDDB.
The application of the invention can realize the following functions.
1) And proposing a main and additional entity bibliographic item and a directory item. When the entity of the heterogeneous data is identified, a large number of various data index items are screened and extracted according to the subject and subsidiary entity entry, so that the consistency of indexes representing the entity and the uniform storage of the data are facilitated, meanwhile, the secondary processing and extraction of the data are performed according to the entity name entry, and the unified external service and large-scale data relation calculation of the data are facilitated.
2) The entity is matched with the scene. When the entity attributes of the data are used for matching and identifying, a same-scene identification mechanism of the entity is introduced according to the entity application scene of the data, the entity matching difficulty and complexity are reduced, and the entity matching accuracy is improved.
3) And proposing entity attribute classification and weight. According to the attribute characteristics of the entity, the attributes of the entity bibliography items are divided into unique items, invariable items and common items, different weight values are respectively given to the unique items, and entity identification is carried out by utilizing the weight values, so that the entity identification calculation time is favorably reduced, and the problems of attribute front-back contradiction and the like are solved.
4) And respectively storing and processing the discrimination probabilities. In the process of identifying the entities, besides the information of the same entity successfully identified, the probability of the same entity among a plurality of entities is also recorded and respectively stored and processed, so that the difficulty of secondary entity identification is reduced, and the deep mining and data analysis of different scene applications and entity relations are facilitated.

Claims (6)

1. A processing method for multi-source main and subsidiary entity identity discrimination and data self-complementing is characterized by comprising the following steps:
A1. extracting a main entity bibliography MEFS and an accessory entity bibliography SEFS from an entity bibliography database EFDB of a source A, extracting an application scene ES between a main entity M (M) and an accessory entity S (M) from an entity application scene database ESDB of the source A, extracting entity static database related entity information from an entity static database RSDB, extracting information representing a single-source same entity according to the main entity and the same scene information by using a single-source same entity screening and data supplementing device, storing the information into a same entity database SEDB, and performing data supplementation;
A2. extracting entity static library related entity information from an entity static database RSDB, extracting an accessory entity entry SEFS from an entity entry database EFDB of a source B, extracting an application scene ES between a main entity M (M) and an accessory entity S (M) from an entity application scene database ESDB of the source B, extracting dynamic library entity data information from an entity dynamic database RVDB, extracting same entity data information from a same entity database SEDB, judging the identity of heterogeneous entities according to rules by utilizing a heterogeneous same entity discriminator, extracting information representing the heterogeneous same entity, transmitting the information to a heterogeneous entity data supplifier, and simultaneously storing the information into a main entity dynamic database RVDB;
A3. extracting dynamic database entity data information from an entity dynamic database RVDB, extracting same entity data information from a same entity database SEDB, receiving information of a same source and a same entity from a heterogeneous same entity discriminator, supplementing the information of the heterogeneous entity by using a heterogeneous entity data supplementing device according to a time nearest principle, and storing the information of the heterogeneous entity supplementation into the entity dynamic database RVDB;
A4. extracting the same entity data information from the same entity database SEDB, extracting the entity data information of the dynamic database from the entity dynamic database RVDB, utilizing an entity directory item automatic extraction generator, extracting entity directory ELS information according to an entity directory essential item ELES, and storing the entity directory ELS information into an entity directory database EDDB;
A5. extracting dynamic database entity data information from an entity dynamic database RVDB, extracting entity directory information from an entity directory database EDDB, automatically separating sub-entity information from the entity directory database EDDB by utilizing a sub-entity automatic separator according to rules to form sub-entity directory information, and storing the sub-entity directory information into the entity directory database EDDB.
2. The method for processing the identity discrimination and data complementation of the multi-source main and auxiliary entities according to claim 1, wherein the working method of the single-source identity discrimination and data complementation device in step a1 is as follows:
A11. reading a single-source multi-library data set DSB from an entity static library database RSDB of a source A;
A12. reading the number N1 of unwarehoused libraries from an entity bibliographic item database EFDB of a source A, and setting N1 as 1;
A13. reading a main entity bibliographic item MEFS of a library n1, obtaining a data set DSA of the main entity bibliographic item MEFS, simultaneously obtaining the number of records I1 of the data set DSA, and setting I1 as 1;
A14. reading the i1 th record in the DSA, matching the unique item K of the entry data with the data in the DSB, if the matching is successful, executing the step A15, and if the matching is unsuccessful, executing the step A19;
A15. extracting the related information representing the identity of the single-source entity of the main entity m1 corresponding to the record i1, and writing the related information into an identical entity database SEDB;
A16. reading a related information data set DSC of the master entity m1 in the source A, which characterizes the same entity, from the same entity database SEDB;
A17. reading an affiliated entity information set DSS corresponding to a main entity m1 from an entity application scene database ESDB, and judging whether a specific affiliated entity s has the same entity or not by using the same scene SS rule; if the same entity exists, performing step A18, otherwise, performing step A19;
A18. extracting the same entity related information of the specific affiliated entity s, and writing the same entity related information into a same entity database SEDB;
A19. judging whether I1> I1 is true, if so, executing I1 to I1+1, and jumping to the step A14 for execution; otherwise, jumping to the step A110 for execution;
A110. judging whether N1> N1 is true, if so, executing N1 as N1+1, and jumping to the step A13 for execution; otherwise, ending.
3. The method for multi-source identity screening and data self-complementing of main and additional entities according to claim 2, wherein the method for the heterology and identity discriminator to discriminate the heterology entity identity in step a2 comprises:
A21. reading the number N2 of the auxiliary entity types which are not put in a storage from an entity bibliography item database EFDB of a source B according to the entity types, and setting N2 as 1;
A22. reading the relevant information of the specific affiliated entity type n2, and simultaneously obtaining the warehousing threshold TH of the affiliated entity type n2 set by the system;
A23. judging whether the corresponding entity dynamic database RVDB exists or not according to the affiliated entity type n2, if so, executing the step A24, and if not, jumping to the step A214 for execution;
A24. reading a relevant information data set DSF representing the same affiliated entity type n2 from the same entity database SEDB according to the affiliated entity type n 2;
A25. reading a dynamic library information data set DSD from an entity dynamic library RVDB;
A26. reading a set DSG of an affiliated entity type n2 from an affiliated entity bibliography database EFDB of a source B to obtain a record number M2, and setting M2 to be 1;
A27. reading m2 records of the subject entity bibliographic items from the set DSG;
A28. reading a specific application scenario es between the affiliated entity corresponding to the record m2 and the main entity from an entity application scenario database ESDB of the source B according to the affiliated entity type n2 and the record m 2;
A29. reading a specific static database data set DSE corresponding to the record m2 from an entity static database RSDB of the source B according to the affiliated entity type n2 and the record m 2;
A210. acquiring set DSF information from step A24, acquiring set DSD information from step A25, acquiring record m2 information from step A27, acquiring application scenario es information from step A28, acquiring set DSE information from step A29, matching in the set DSD according to a set rule by using unique item, invariant item and common item attribute of record m2 of the subject entity bibliography item, and the application scenario es, set DSD, set DSE and set DSF information, and calculating a similarity probability P (A) between entities;
A211. judging whether P (A) > TH is true, if not, jumping to the step A213 to execute, if so, writing P (A) and the information representing the entity item into the same entity database SEDB;
A212. judging whether P (A) is true or not, if not, jumping to the step A213 for execution, if so, transmitting the information of the record m2, the specific record item d corresponding to the set DSD, the specific record item e corresponding to the set DSE and the specific record item f corresponding to the set DSF into a heterologous entity data supplyer, and starting the operation of the heterologous entity data supplyer;
A213. judging whether M2> M2 is true, if true, executing M2-M2 +1, and simultaneously jumping to execute the step A26; if not, perform step A214;
A214. judging whether N2> N2 is true, if true, executing N2-N2 +1, and simultaneously jumping to the step A22 to execute; if not, it ends.
4. The method for multi-source identity screening and data self-complementing of main and subsidiary entities according to claim 3, wherein the method for information supplementation of the alien entity in step A3 comprises:
A31. receiving information of a record m2, a specific record item d corresponding to the set DSD, a specific record item e corresponding to the set DSE and a specific record item f corresponding to the set DSF;
A32. aiming at the unique item, the invariable item and the common item attribute of a specific bibliographic item, obtaining the number N3 of the attributes, and setting N3 as 1;
A33. obtaining the attribute name of the n3 th attribute;
A34. reading the corresponding data dn of the record item d according to the attribute name, and simultaneously, sequentially reading the corresponding data of the record m2, the record item e and the record item f, and comparing the corresponding data with the dn;
A35. judging whether dn is empty, if so, jumping to the step A36 for execution, and if not, switching to the step A37 for execution;
A36. supplementing corresponding latest data in the record m2, the record item e and the record item f into dn according to a time latest principle, and recording a time stamp and source information of the supplemented data;
A37. marking the time stamp and the source information of the corresponding attribute data in the record m2, the record item e and the record item f;
A38. forming a temporary record item d', judging whether N3> N3 is true, if so, jumping to the step A33 for execution, otherwise, executing the step A39;
A39. for other attributes except the unique item, the invariable item and the common item, reading corresponding attribute data in the record item m2, the record item e and the record item f in sequence, and comparing the attribute data with the record item d;
A310. recording the time stamp and the source information to form a latest temporary record item; updated into the entity dynamic database RVDB.
5. The method for multi-source entity identity screening and data complementation according to claim 4, wherein the method for generating the entity directory information in the step A4 comprises the following steps:
A41. setting entity types according to a system, obtaining the number N4 of the entity types, and setting N4 as 1;
A42. reading an entity directory entry els and an entity directory essential entry eles of the entity n 4;
A43. reading from the same entity database SEDB the same entity data set DSH of entity n for which p (a) ═ 100%;
A44. according to a set DSH, extracting relevant data information of an entity directory entry els of an entity n4 from an entity dynamic database according to a latest time principle to form a temporary data set DSI;
A45. according to a data non-null principle of a necessary item eles of an entity directory of an entity n4, filtering a set DSI to form a data subset DSJ;
A46. writing the set DSJ into an entity directory database EDDB as entity directory ELS information of an entity n 4;
A47. and judging whether N4> N4 is true, if so, making N4 equal to N4+1, and jumping to step A42 to execute, otherwise, ending.
6. The method for multi-source identity screening and data complementation of main and subsidiary entities according to claim 5, wherein the automatic separation method of the sub-entity directory information in step A5 comprises the following steps:
A51. starting a sub-entity separation program of a specific entity n5 according to a user instruction;
A52. reading an entity separation rule r specified or preset by a user;
A53. reading a directory data set DSO of a specific entity n5 from an entity directory database EDDB, and setting a temporary data set DSP;
A54. obtaining the number I5 of records in the set DSO, and setting I5 as 1;
A55. reading record n5 in the set DSO, reading corresponding dynamic library entity data information in the entity dynamic database RVDB according to the information of record n5, matching, if matching is successful, executing step A56, otherwise, executing step A57;
A56. adding the record n5 into a data set DSP;
A57. judging whether I5> I5 is true, if so, executing I5 to I5+1, jumping to step A55 for execution, and if not, executing step A58;
A58. and writing the data set DSP into an entity directory database EDDB.
CN202210592302.7A 2022-05-27 2022-05-27 Multi-source main and auxiliary entity identity discrimination and data self-supplementing processing method Active CN114969041B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210592302.7A CN114969041B (en) 2022-05-27 2022-05-27 Multi-source main and auxiliary entity identity discrimination and data self-supplementing processing method
ZA2022/11776A ZA202211776B (en) 2022-05-27 2022-10-28 Multisource main-subsidiary entity identity discrimination and data self-supplementation processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210592302.7A CN114969041B (en) 2022-05-27 2022-05-27 Multi-source main and auxiliary entity identity discrimination and data self-supplementing processing method

Publications (2)

Publication Number Publication Date
CN114969041A true CN114969041A (en) 2022-08-30
CN114969041B CN114969041B (en) 2023-06-30

Family

ID=82958053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210592302.7A Active CN114969041B (en) 2022-05-27 2022-05-27 Multi-source main and auxiliary entity identity discrimination and data self-supplementing processing method

Country Status (2)

Country Link
CN (1) CN114969041B (en)
ZA (1) ZA202211776B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680325A (en) * 2023-06-25 2023-09-01 杭州电子科技大学 Time-series record link data matching method and device based on attribute correlation
WO2025107392A1 (en) * 2023-11-21 2025-05-30 深圳计算科学研究院 Data cleaning method and apparatus for erroneously matched entity, device and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1656442A (en) * 2001-12-28 2005-08-17 杰佛里·詹姆斯·乔纳斯 Real-time data storage
US20090172047A1 (en) * 2007-12-28 2009-07-02 Knowledge Computing Corporation Method and Apparatus for Loading Data Files into a Data-Warehouse System
CN105893526A (en) * 2016-03-30 2016-08-24 上海坤士合生信息科技有限公司 Multi-source data fusion system and method
US20200089692A1 (en) * 2018-03-27 2020-03-19 Innoplexus Ag System and method for identifying at least one association of entity
CN112231283A (en) * 2020-09-08 2021-01-15 苏宁金融科技(南京)有限公司 Generation management method and system based on multi-source heterogeneous data unified entity identification code
CN113076306A (en) * 2021-06-07 2021-07-06 航天神舟智慧系统技术有限公司 Data resource automatic collection method and system based on cataloguing rule
CN113342909A (en) * 2021-08-06 2021-09-03 中科雨辰科技有限公司 Data processing system for identifying identical solid models
CN113760996A (en) * 2021-09-09 2021-12-07 上海明略人工智能(集团)有限公司 A data integration method and system, device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1656442A (en) * 2001-12-28 2005-08-17 杰佛里·詹姆斯·乔纳斯 Real-time data storage
US20090172047A1 (en) * 2007-12-28 2009-07-02 Knowledge Computing Corporation Method and Apparatus for Loading Data Files into a Data-Warehouse System
CN105893526A (en) * 2016-03-30 2016-08-24 上海坤士合生信息科技有限公司 Multi-source data fusion system and method
US20200089692A1 (en) * 2018-03-27 2020-03-19 Innoplexus Ag System and method for identifying at least one association of entity
CN112231283A (en) * 2020-09-08 2021-01-15 苏宁金融科技(南京)有限公司 Generation management method and system based on multi-source heterogeneous data unified entity identification code
CN113076306A (en) * 2021-06-07 2021-07-06 航天神舟智慧系统技术有限公司 Data resource automatic collection method and system based on cataloguing rule
CN113342909A (en) * 2021-08-06 2021-09-03 中科雨辰科技有限公司 Data processing system for identifying identical solid models
CN113760996A (en) * 2021-09-09 2021-12-07 上海明略人工智能(集团)有限公司 A data integration method and system, device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
QIANQIAN YU等: "Practice of constructing name authority database based on multi-source data integration", 《JCDL \'19: PROCEEDINGS OF THE 18TH JOINT CONFERENCE ON DIGITAL LIBRARIES》 *
冀振燕等: "可扩展的融合多源异构数据的推荐模型", 《北京邮电大学学报》, vol. 44, no. 3 *
史新辉等: "如何建立企业情报信息中心", 《企业管理》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680325A (en) * 2023-06-25 2023-09-01 杭州电子科技大学 Time-series record link data matching method and device based on attribute correlation
WO2025107392A1 (en) * 2023-11-21 2025-05-30 深圳计算科学研究院 Data cleaning method and apparatus for erroneously matched entity, device and medium

Also Published As

Publication number Publication date
ZA202211776B (en) 2022-12-21
CN114969041B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
CN109446520B (en) Data clustering method and device for constructing knowledge base
AU2011239618B2 (en) Ascribing actionable attributes to data that describes a personal identity
CN103733195A (en) Managing storage of data for range-based searching
MX2012008714A (en) System and method for aggregation and association of professional affiliation data with commercial data content.
CN114969041A (en) Processing method for multi-source main and subsidiary entity identity discrimination and data self-complementing
CN103136228A (en) Image search method and image search device
CN106777130A (en) A kind of index generation method, data retrieval method and device
CN112214557B (en) Data matching classification method and device
CN120316265A (en) A method for entity extraction, storage and retrieval based on LLM and multiple databases
CN102169499A (en) Repeated ticket removing method
CN111311329B (en) Tag data acquisition method, device, equipment and readable storage medium
CN112148938A (en) Cross-domain heterogeneous data retrieval system and retrieval method
CN118520147B (en) Storage optimization method and system for multi-source data fusion
CN112486989B (en) Multi-source data granulation fusion and index classification and layering processing method
CN107085603B (en) Data processing method and device
CN110874366A (en) Data processing and query method and device
US20110113006A1 (en) Business process control apparatus, businesses process control method and business process control program
KR101846347B1 (en) Method and apparatus for managing massive documents
US10614102B2 (en) Method and system for creating entity records using existing data sources
CN114780654A (en) Processing method for modular construction of multi-source main and auxiliary entity structure
CN109785099B (en) Method and system for automatically processing service data information
CN114880330B (en) Modularized construction and entity automatic identification processing method for single-source multi-period index system
CN114218294B (en) Associated analysis mining method of tax-related data
EP4235453A1 (en) Method for creating an index for reporting large-scale variant clusterings
CN112131850B (en) Intelligent bill of lading template management method, device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant