[go: up one dir, main page]

CN115952178B - Multi-level associated data heterogeneous data synchronization method - Google Patents

Multi-level associated data heterogeneous data synchronization method Download PDF

Info

Publication number
CN115952178B
CN115952178B CN202211541387.2A CN202211541387A CN115952178B CN 115952178 B CN115952178 B CN 115952178B CN 202211541387 A CN202211541387 A CN 202211541387A CN 115952178 B CN115952178 B CN 115952178B
Authority
CN
China
Prior art keywords
data
synchronization
message
full
heterogeneous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211541387.2A
Other languages
Chinese (zh)
Other versions
CN115952178A (en
Inventor
谢高峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huayu Jiupin Technology Co ltd
Original Assignee
Beijing Huayu Jiupin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huayu Jiupin Technology Co ltd filed Critical Beijing Huayu Jiupin Technology Co ltd
Priority to CN202211541387.2A priority Critical patent/CN115952178B/en
Publication of CN115952178A publication Critical patent/CN115952178A/en
Application granted granted Critical
Publication of CN115952178B publication Critical patent/CN115952178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-level associated data heterogeneous data synchronization method, which comprises the steps of firstly carrying out full data synchronization on binlog data of a plurality of data sources including Mysql and Pgsql, and particularly distinguishing a synchronization sequence according to metadata management and heterogeneous data synchronization rules; respectively inquiring the minimum ids of the data sources according to the sequence of the table; starting an increment synchronization thread before executing full synchronization, and ensuring the consistency of data after the full synchronization; circularly scrolling the data; inquiring corresponding data in batches, if not, synchronously finishing the representing data, otherwise, carrying out batch processing; the batch processing is to change the full data into incremental data through codes, and to carry out unified method calling and packaging; performing target end data insertion, which is specifically performed according to the following steps; and inquiring the data of the new database and the target database in batches, and circularly processing the data. The invention realizes the synchronization of multi-heterogeneous data of multi-parent-child data under the support of 2 schemes of full update/incremental update.

Description

Multi-level associated data heterogeneous data synchronization method
Technical Field
The invention relates to the technical field of data processing, in particular to a multi-level associated data heterogeneous data synchronization method.
Background
Multi-layer parent-child data: the current heterogeneous data synchronization scheme is basically realized by a single table/a wide table, and when the heterogeneous data synchronization scheme is used for developing and understanding service personnel, a certain error area is caused, and the maintenance of father and son data can improve the understanding and developing efficiency of the service, the service scene is not solved by the wide table, and along with the development of a distributed system, the more services are, the higher the query cost of the data is, so that the aggregation of various data is involved. The invention can achieve data maintenance and inquiry of multiple father and child levels by setting metadata, and is more suitable for service understanding.
2 Isomerism: at present, the data source supports mysql, oraclesqlserver, db, postgresql, mongodb, hive, hbase, elasticsearch and the like
Several data synchronization schemes on the market are respectively
Monitoring binlog to asynchronously send mq and then consume, but they do not guarantee the reliability of the mq message, and if the machine is down, inconsistent problems can be caused;
The direct list table performs data synchronization on the list table, but the service semantics are not clear, and the data aggregation function of the distributed service is not solved.
The method is characterized in that data in a preset data warehouse and custom statistical indexes are configured and managed with a visual large screen, and the used Hive continues to execute data filtering, which is a scheme, but the deployment is complex, wherein the Hive is used, small and medium companies do not necessarily have Hive,
Therefore, aiming at the problems, a method capable of ensuring the reliability and the correctness of data and solving the problem of excessive scheme by starting a server for solving the synchronous watt of complex data by middle and small companies is needed, and a multi-level related data heterogeneous data synchronization method is needed.
Disclosure of Invention
The invention aims to provide a multi-level associated data heterogeneous data synchronization method. The invention is realized in the following way:
the invention provides a multi-level associated data heterogeneous data synchronization method, which is specifically implemented by the following steps:
S 1, firstly performing full-volume data synchronization on binlog data of various data sources including but not limited to Mysql and Pgsql, and specifically comprising the following steps:
S 1,1, distinguishing the synchronous sequence according to metadata management and heterogeneous data synchronous rules; the data from the number sources is divided into a plurality of tables according to metadata management and heterogeneous data synchronization rules, including but not limited to using a table, b table, and c table representations.
S 1,2, respectively inquiring the minimum ids of the data sources according to the sequence of the table;
s 1,3, starting an increment synchronization thread before executing full synchronization, and ensuring the consistency of data after the full synchronization;
S 1,4, circularly scrolling data;
S 1,5, inquiring corresponding data in batches, if not, synchronously finishing the data, otherwise, carrying out batch processing;
S 1,6, batch processing is to change full data into incremental data through codes, and perform unified method calling and packaging;
s 2, inserting target end data, wherein the method specifically comprises the following steps of;
S 2.1, inquiring data of the new database and the target database in batches, and circularly processing the data;
S 2.2, checking whether the synchronized record exists in the new database;
S 2.3, judging whether the time of synchronous data is longer than the time of a new library or not if yes, putting the data meeting the conditions into a binlogdatas array, and performing performance optimization of a producer/consumer by using a concept of jvm S S1;
S 2.4, the update data does not exist or is currently updated, but the target end does not inquire, and the update operation is changed into an insert operation to make up;
S 3, performing incremental data synchronization;
S 3.1, according to the synchronization of the full data, simultaneously starting incremental data monitoring, selecting Rocketmq by Mq, and ensuring zero loss at Mq by using a transaction message in the Rocketmq;
S 3.2, firstly starting a message pulling task by multithreading, firstly inquiring whether a consumption record exists in the message, if so, the message pulling task is in a state of 'successful consumption, submitted and directly submitted to an offset in the MQ', pulling a retry task through binlog message, thus obtaining an unconsumed record, writing the unconsumed record into a memory queue, and obtaining the unconsumed record through the binlog pulling retry task and writing the unconsumed record into the memory queue. If the status is "consuming successfully, committed" is not true, the memory queue is also written.
S 3.3, recording in a database without newly added consumption records, and putting the data into binlogdatas to wait for asynchronous data processing;
s 3.4, through pulling the mq message, judging whether the unprocessed condition of the data caused by the service kill exists or not when the mq message is executed to half, if so, acquiring an unconsumed record, and uniformly putting binlogdatas; the message is pulled uniformly.
S 3.5, in the other case that the mq message is processed but the offset fails, judging that mqserver or client has a problem, acquiring records of all consumed states, submitting consumed information offset, and updating the record states to be submitted;
S 3.6, changing the full data into incremental data, and carrying out unified call;
S 3.7, uniformly processing all the processing on the target end by the incremental data, and packaging;
S 4, finishing data synchronization.
Further, when the total data is synchronized, when the child data is earlier than the parent data, temporary caching is performed according to a caching mechanism, and after the main table data arrives, the correct data result is assembled and synchronized to the target end.
Further, the data sources include, but are not limited to mysql, oraclesqlserver, db, postgresql, mongodb, hive, hbase, elasticsearch.
Compared with the prior art, the invention has the beneficial effects that:
The synchronization of multi-heterogeneous data of multiple father-son data (final consistency scheme is selected) is realized under the support of 2 schemes of full update/incremental update, the binlog data of multiple data sources such as Mysql, pgsql and the like can be monitored, the data is converted through metadata management service, the operations such as new addition/modification/deletion and the like are realized, multi-level data is maintained, and the service system can conveniently inquire.
The reliability and the correctness of the data are ensured through the mq zero-loss scheme and various compensation schemes in the system, the distributed aggregation function is achieved through the father-son structure, the data aggregation problem of the multilayer father-son data is solved, the synchronous consistency of complex data can be solved by starting a server by small and medium companies, and the problem of overweight scheme is avoided.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow diagram of the method of the present invention;
FIG. 2 is a schematic diagram of incremental data synchronization of the present invention;
FIG. 3 is a schematic diagram of a parent-child data implementation of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.
Referring to fig. 1-3, a multi-level associated data heterogeneous data synchronization method includes the following steps:
S 1, firstly performing full-volume data synchronization on binlog data of various data sources including but not limited to Mysql and Pgsql, and specifically comprising the following steps:
S 1,1, distinguishing the synchronous sequence according to metadata management and heterogeneous data synchronous rules; the data from the number sources is divided into a plurality of tables according to metadata management and heterogeneous data synchronization rules, including but not limited to using a table, b table, and c table representations.
S 1,2, respectively inquiring the minimum ids of the data sources according to the sequence of the table;
s 1,3, starting an increment synchronization thread before executing full synchronization, and ensuring the consistency of data after the full synchronization;
S 1,4, circularly scrolling data;
S 1,5, inquiring corresponding data in batches, if not, synchronously finishing the data, otherwise, carrying out batch processing;
S 1,6, batch processing is to change full data into incremental data through codes, and perform unified method calling and packaging;
s 2, inserting target end data, wherein the method specifically comprises the following steps of;
S 2.1, inquiring data of the new database and the target database in batches, and circularly processing the data;
S 2.2, checking whether the synchronized record exists in the new database;
S 2.3, judging whether the time of synchronous data is longer than the time of a new library or not if yes, putting the data meeting the conditions into a binlogdatas array, and performing performance optimization of a producer/consumer by using a concept of jvm S S1;
S 2.4, the update data does not exist or is currently updated, but the target end does not inquire, and the update operation is changed into an insert operation to make up;
S 3, performing incremental data synchronization;
S 3.1, according to the synchronization of the full data, simultaneously starting incremental data monitoring, selecting Rocketmq by Mq, and ensuring zero loss at Mq by using a transaction message in the Rocketmq;
S 3.2, firstly starting a message pulling task by multithreading, firstly inquiring whether a consumption record exists in the message, if so, the message pulling task is in a state of 'successful consumption, submitted and directly submitted to an offset in the MQ', pulling a retry task through binlog message, thus obtaining an unconsumed record, writing the unconsumed record into a memory queue, and obtaining the unconsumed record through the binlog pulling retry task and writing the unconsumed record into the memory queue. If the status is "consuming successfully, committed" is not true, the memory queue is also written.
S 3.3, recording in a database without newly added consumption records, and putting the data into binlogdatas to wait for asynchronous data processing;
s 3.4, through pulling the mq message, judging whether the unprocessed condition of the data caused by the service kill exists or not when the mq message is executed to half, if so, acquiring an unconsumed record, and uniformly putting binlogdatas; the message is pulled uniformly.
S 3.5, in the other case that the mq message is processed but the offset fails, judging that mqserver or client has a problem, acquiring records of all consumed states, submitting consumed information offset, and updating the record states to be submitted;
S 3.6, changing the full data into incremental data, and carrying out unified call;
S 3.7, uniformly processing all the processing on the target end by the incremental data, and packaging;
S 4, finishing data synchronization.
In this embodiment, when full data synchronization is performed, if the child data is earlier than the parent data, temporary buffering is performed according to a buffering mechanism, and after the main table data arrives, the correct data result is assembled and synchronized to the target end.
In this embodiment, the data sources include, but are not limited to mysql, oraclesqlserver, db, postgresql, mongodb, hive, hbase, elasticsearch.
In this embodiment, specific operation codes are as follows:
"domain": "xx",// index name
"rangeScrollConfig":{
"SourceConfigId":1,// Source configuration id
"TargetConfigId":2// target configuration id
},
"StartTime": "2020-01-0100:00:00"// start time
"EndTime": "2023-01-0100:00:00",// end time
"StartScrollId":1,// id of start
"RangeScrollTaskConfig" {// synchronized metadata
Table name of xxx and/or synchronous
"Columns" [ "case_id", "apply_id", "create_time", "update_time" ],// column names related to synchronization
List of "child [ {// child data ]
Column set of "(" xxxx "," xxx "," xxx "," xxx "],// sub-data [" column set "]
"Field": "xxx",// associated fields of the sub-table
"ParentId": "xxx",// and associated fields of the parent table
"RelationType": "n"// correspondence is 1 pair n
Table name of xxx and/or sub-table
"RelationIdField": "xxx"// the associated fields of the table are for use without the primary table parentId
}
],
"Parent [ {// set of parent tables ]
Column [ "xxx", "xxx" ],// column set of parent table
"Field": "xxx",// associated field name of parent table
"ParentId": "xxx",// associated fields of parent and master tables
"Table" is "xxx"// father table
}]
}
}
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations may be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A multi-level associated data heterogeneous data synchronization method is characterized by comprising the following steps of: the method comprises the following steps:
S 1, firstly performing full-volume data synchronization on binlog data of various data sources including but not limited to Mysql and Pgsql, and specifically comprising the following steps:
S 1,1, distinguishing the synchronous sequence according to metadata management and heterogeneous data synchronous rules;
S 1,2, respectively inquiring the minimum ids of the data sources according to the sequence of the table;
s 1,3, starting an increment synchronization thread before executing full synchronization, and ensuring the consistency of data after the full synchronization;
S 1,4, circularly scrolling data;
S 1,5, inquiring corresponding data in batches, if a returned result is empty, synchronously finishing the data, otherwise, carrying out batch processing;
S 1,6, batch processing is to change full data into incremental data through codes, and perform unified method calling and packaging;
s 2, inserting target end data, wherein the method specifically comprises the following steps of;
S 2.1, inquiring data of the new database and the target database in batches, and circularly processing the data;
S 2.2, checking whether the synchronized record exists in the new database;
S 2.3, judging whether the time of synchronous data is longer than the time of a new library or not if yes, putting the data meeting the conditions into a binlogdatas array, and performing performance optimization of a producer/consumer by using a concept of jvm S S1;
S 2.4, the update data does not exist or is currently updated, but the target end does not inquire, and the update operation is changed into an insert operation to make up;
S 3, performing incremental data synchronization;
S 3.1, according to the synchronization of the full data, simultaneously starting incremental data monitoring, selecting Rocketmq by Mq, and ensuring zero loss at Mq by using a transaction message in the Rocketmq;
S 3.2, firstly starting a message pulling task by multithreading, firstly inquiring whether a consumption record exists in the message, if so, submitting the message in a state of 'successful consumption and submitted', and directly submitting an offset in the MQ;
S 3.3, checking whether the message is processed in the message pulling task, if so, directly modifying the message state to be completed, otherwise, putting the data into binlogdatas, and waiting for asynchronous data processing;
S 3.4, through pulling the mq message, judging whether the unprocessed condition of the data caused by the service kill exists or not when the mq message is executed to half, if so, acquiring an unconsumed record, and uniformly putting binlogdatas;
S 3.5, in the other case that the mq message is processed but the offset fails, judging that mqserver or client has a problem, acquiring records of all consumed states, submitting consumed information offset, and updating the record states to be submitted;
S 3.6, changing the full data into incremental data, and carrying out unified call;
S 3.7, uniformly processing all the processing on the target end by the incremental data, and packaging;
S 4, finishing data synchronization.
2. The method for heterogeneous data synchronization of multi-level associated data according to claim 1, wherein when full data synchronization is performed, when child data is earlier than parent data, temporary caching is performed according to a caching mechanism, and after the arrival of main table data, correct data results are assembled and synchronized to a target end.
3. The method of claim 1, wherein in step S 1,1, the data from the number sources is divided into a plurality of tables according to metadata management and heterogeneous data synchronization rules, including but not limited to using a table, b table and c table.
4. A multi-level associative data heterogeneous data synchronization method according to claim 1 wherein said data sources include, but are not limited to mysql, oraclesqlserver, db, postgresql, mongodb, hive, hbase, elasticsearch.
5. The method according to claim 1, wherein in step S 3.2, the retry task is pulled through a binlog message, so as to obtain an unconsumed record, and the unconsumed record is written into the memory queue.
6. The method according to claim 5, wherein in step S 3.2, if the status is "consuming successfully, committed" is not satisfied, the memory queue is also written.
7. The method for heterogeneous data synchronization of multi-level associated data according to claim 1, wherein in step S 3.4, the message is pulled uniformly, and the unconsumed record is obtained by pulling the retry task by binlog and written into the memory queue.
CN202211541387.2A 2022-12-01 2022-12-01 Multi-level associated data heterogeneous data synchronization method Active CN115952178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211541387.2A CN115952178B (en) 2022-12-01 2022-12-01 Multi-level associated data heterogeneous data synchronization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211541387.2A CN115952178B (en) 2022-12-01 2022-12-01 Multi-level associated data heterogeneous data synchronization method

Publications (2)

Publication Number Publication Date
CN115952178A CN115952178A (en) 2023-04-11
CN115952178B true CN115952178B (en) 2024-08-23

Family

ID=87286765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211541387.2A Active CN115952178B (en) 2022-12-01 2022-12-01 Multi-level associated data heterogeneous data synchronization method

Country Status (1)

Country Link
CN (1) CN115952178B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119322806A (en) * 2024-09-26 2025-01-17 浪潮卓数大数据产业发展有限公司 RocketMQ-based data synchronization method and RocketMQ-based data synchronization system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103885986A (en) * 2012-12-21 2014-06-25 阿里巴巴集团控股有限公司 Main and auxiliary database synchronization method and device
CN110750647A (en) * 2019-10-17 2020-02-04 北京华宇信息技术有限公司 Construction method of ELP model of multi-source heterogeneous information data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001046889A2 (en) * 1999-12-22 2001-06-28 Accenture Llp A method for executing a network-based credit application process
US8301593B2 (en) * 2008-06-12 2012-10-30 Gravic, Inc. Mixed mode synchronous and asynchronous replication system
CN115080666A (en) * 2022-07-05 2022-09-20 携程商旅信息服务(上海)有限公司 Data synchronization method, system, electronic device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103885986A (en) * 2012-12-21 2014-06-25 阿里巴巴集团控股有限公司 Main and auxiliary database synchronization method and device
CN110750647A (en) * 2019-10-17 2020-02-04 北京华宇信息技术有限公司 Construction method of ELP model of multi-source heterogeneous information data

Also Published As

Publication number Publication date
CN115952178A (en) 2023-04-11

Similar Documents

Publication Publication Date Title
CN108228817B (en) Data processing method, device and system
CN106021594B (en) Mapping processing method and system of database table and XML message
CN107818115B (en) Method and device for processing data table
US9589041B2 (en) Client and server integration for replicating data
CN109800222A (en) A kind of HBase secondary index adaptive optimization method and system
US20200286014A1 (en) Information updating method and device
CN111506559A (en) Data storage method and device, electronic equipment and storage medium
US8015195B2 (en) Modifying entry names in directory server
CN102789491A (en) Configurable data subscribing and publishing system and method thereof
CN105900093A (en) A method for updating a data table of a KeyValue database and a device for updating table data
CN111259082B (en) A method to realize full data synchronization in a big data environment
CN115952178B (en) Multi-level associated data heterogeneous data synchronization method
CN113297222B (en) Report data acquisition method and device, electronic equipment and storage medium
CN115577040A (en) Hierarchical data synchronization method and device, electronic equipment and readable medium
CN109885642B (en) Hierarchical storage method and device for full-text retrieval
EP2662783A1 (en) Data archiving approach leveraging database layer functionality
CN112559495A (en) System for supporting multi-system label mapping to realize unified label management
CN113760966A (en) Data processing method and device based on heterogeneous database system
CN107291938A (en) Order Query System and method
CN115391338A (en) Distributed primary key generation method and device based on redis and lua scripts
CN113761052A (en) Database synchronization method and device
US20230342062A1 (en) Live data migration in document stores
CN113760600A (en) Database backup method, database restoration method and related device
CN119149590A (en) Inventory snapshot generation method and device
US20230062227A1 (en) Index generation and use with indeterminate ingestion patterns

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant