CN115952178B

CN115952178B - Multi-level associated data heterogeneous data synchronization method

Info

Publication number: CN115952178B
Application number: CN202211541387.2A
Authority: CN
Inventors: 谢高峰
Original assignee: Beijing Huayu Jiupin Technology Co ltd
Current assignee: Beijing Huayu Jiupin Technology Co ltd
Priority date: 2022-12-01
Filing date: 2022-12-01
Publication date: 2024-08-23
Anticipated expiration: 2042-12-01
Also published as: CN115952178A

Abstract

The invention discloses a multi-level associated data heterogeneous data synchronization method, which comprises the steps of firstly carrying out full data synchronization on binlog data of a plurality of data sources including Mysql and Pgsql, and particularly distinguishing a synchronization sequence according to metadata management and heterogeneous data synchronization rules; respectively inquiring the minimum ids of the data sources according to the sequence of the table; starting an increment synchronization thread before executing full synchronization, and ensuring the consistency of data after the full synchronization; circularly scrolling the data; inquiring corresponding data in batches, if not, synchronously finishing the representing data, otherwise, carrying out batch processing; the batch processing is to change the full data into incremental data through codes, and to carry out unified method calling and packaging; performing target end data insertion, which is specifically performed according to the following steps; and inquiring the data of the new database and the target database in batches, and circularly processing the data. The invention realizes the synchronization of multi-heterogeneous data of multi-parent-child data under the support of 2 schemes of full update/incremental update.

Description

Multi-level associated data heterogeneous data synchronization method

Technical Field

The invention relates to the technical field of data processing, in particular to a multi-level associated data heterogeneous data synchronization method.

Background

Multi-layer parent-child data: the current heterogeneous data synchronization scheme is basically realized by a single table/a wide table, and when the heterogeneous data synchronization scheme is used for developing and understanding service personnel, a certain error area is caused, and the maintenance of father and son data can improve the understanding and developing efficiency of the service, the service scene is not solved by the wide table, and along with the development of a distributed system, the more services are, the higher the query cost of the data is, so that the aggregation of various data is involved. The invention can achieve data maintenance and inquiry of multiple father and child levels by setting metadata, and is more suitable for service understanding.

2 Isomerism: at present, the data source supports mysql, oraclesqlserver, db, postgresql, mongodb, hive, hbase, elasticsearch and the like

Several data synchronization schemes on the market are respectively

Monitoring binlog to asynchronously send mq and then consume, but they do not guarantee the reliability of the mq message, and if the machine is down, inconsistent problems can be caused;

The direct list table performs data synchronization on the list table, but the service semantics are not clear, and the data aggregation function of the distributed service is not solved.

The method is characterized in that data in a preset data warehouse and custom statistical indexes are configured and managed with a visual large screen, and the used Hive continues to execute data filtering, which is a scheme, but the deployment is complex, wherein the Hive is used, small and medium companies do not necessarily have Hive,

Therefore, aiming at the problems, a method capable of ensuring the reliability and the correctness of data and solving the problem of excessive scheme by starting a server for solving the synchronous watt of complex data by middle and small companies is needed, and a multi-level related data heterogeneous data synchronization method is needed.

Disclosure of Invention

The invention aims to provide a multi-level associated data heterogeneous data synchronization method. The invention is realized in the following way:

the invention provides a multi-level associated data heterogeneous data synchronization method, which is specifically implemented by the following steps:

S ₁, firstly performing full-volume data synchronization on binlog data of various data sources including but not limited to Mysql and Pgsql, and specifically comprising the following steps:

S _1,1, distinguishing the synchronous sequence according to metadata management and heterogeneous data synchronous rules; the data from the number sources is divided into a plurality of tables according to metadata management and heterogeneous data synchronization rules, including but not limited to using a table, b table, and c table representations.

S _1,2, respectively inquiring the minimum ids of the data sources according to the sequence of the table;

s _1,3, starting an increment synchronization thread before executing full synchronization, and ensuring the consistency of data after the full synchronization;

S _1,4, circularly scrolling data;

S _1,5, inquiring corresponding data in batches, if not, synchronously finishing the data, otherwise, carrying out batch processing;

S _1,6, batch processing is to change full data into incremental data through codes, and perform unified method calling and packaging;

s ₂, inserting target end data, wherein the method specifically comprises the following steps of;

S _2.1, inquiring data of the new database and the target database in batches, and circularly processing the data;

S _2.2, checking whether the synchronized record exists in the new database;

S _2.3, judging whether the time of synchronous data is longer than the time of a new library or not if yes, putting the data meeting the conditions into a binlogdatas array, and performing performance optimization of a producer/consumer by using a concept of jvm S S1;

S _2.4, the update data does not exist or is currently updated, but the target end does not inquire, and the update operation is changed into an insert operation to make up;

S ₃, performing incremental data synchronization;

S _3.1, according to the synchronization of the full data, simultaneously starting incremental data monitoring, selecting Rocketmq by Mq, and ensuring zero loss at Mq by using a transaction message in the Rocketmq;

S _3.2, firstly starting a message pulling task by multithreading, firstly inquiring whether a consumption record exists in the message, if so, the message pulling task is in a state of 'successful consumption, submitted and directly submitted to an offset in the MQ', pulling a retry task through binlog message, thus obtaining an unconsumed record, writing the unconsumed record into a memory queue, and obtaining the unconsumed record through the binlog pulling retry task and writing the unconsumed record into the memory queue. If the status is "consuming successfully, committed" is not true, the memory queue is also written.

S _3.3, recording in a database without newly added consumption records, and putting the data into binlogdatas to wait for asynchronous data processing;

s _3.4, through pulling the mq message, judging whether the unprocessed condition of the data caused by the service kill exists or not when the mq message is executed to half, if so, acquiring an unconsumed record, and uniformly putting binlogdatas; the message is pulled uniformly.

S _3.5, in the other case that the mq message is processed but the offset fails, judging that mqserver or client has a problem, acquiring records of all consumed states, submitting consumed information offset, and updating the record states to be submitted;

S _3.6, changing the full data into incremental data, and carrying out unified call;

S _3.7, uniformly processing all the processing on the target end by the incremental data, and packaging;

S ₄, finishing data synchronization.

Further, when the total data is synchronized, when the child data is earlier than the parent data, temporary caching is performed according to a caching mechanism, and after the main table data arrives, the correct data result is assembled and synchronized to the target end.

Further, the data sources include, but are not limited to mysql, oraclesqlserver, db, postgresql, mongodb, hive, hbase, elasticsearch.

Compared with the prior art, the invention has the beneficial effects that:

The synchronization of multi-heterogeneous data of multiple father-son data (final consistency scheme is selected) is realized under the support of 2 schemes of full update/incremental update, the binlog data of multiple data sources such as Mysql, pgsql and the like can be monitored, the data is converted through metadata management service, the operations such as new addition/modification/deletion and the like are realized, multi-level data is maintained, and the service system can conveniently inquire.

The reliability and the correctness of the data are ensured through the mq zero-loss scheme and various compensation schemes in the system, the distributed aggregation function is achieved through the father-son structure, the data aggregation problem of the multilayer father-son data is solved, the synchronous consistency of complex data can be solved by starting a server by small and medium companies, and the problem of overweight scheme is avoided.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow diagram of the method of the present invention;

FIG. 2 is a schematic diagram of incremental data synchronization of the present invention;

FIG. 3 is a schematic diagram of a parent-child data implementation of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention.

Referring to fig. 1-3, a multi-level associated data heterogeneous data synchronization method includes the following steps:

S _1,4, circularly scrolling data;

S _2.2, checking whether the synchronized record exists in the new database;

S ₃, performing incremental data synchronization;

S ₄, finishing data synchronization.

In this embodiment, when full data synchronization is performed, if the child data is earlier than the parent data, temporary buffering is performed according to a buffering mechanism, and after the main table data arrives, the correct data result is assembled and synchronized to the target end.

In this embodiment, the data sources include, but are not limited to mysql, oraclesqlserver, db, postgresql, mongodb, hive, hbase, elasticsearch.

In this embodiment, specific operation codes are as follows:

"domain": "xx",// index name

"rangeScrollConfig":{

"SourceConfigId":1,// Source configuration id

"TargetConfigId":2// target configuration id

},

"StartTime": "2020-01-0100:00:00"// start time

"EndTime": "2023-01-0100:00:00",// end time

"StartScrollId":1,// id of start

"RangeScrollTaskConfig" {// synchronized metadata

Table name of xxx and/or synchronous

"Columns" [ "case_id", "apply_id", "create_time", "update_time" ],// column names related to synchronization

List of "child [ {// child data ]

Column set of "(" xxxx "," xxx "," xxx "," xxx "],// sub-data [" column set "]

"Field": "xxx",// associated fields of the sub-table

"ParentId": "xxx",// and associated fields of the parent table

"RelationType": "n"// correspondence is 1 pair n

Table name of xxx and/or sub-table

"RelationIdField": "xxx"// the associated fields of the table are for use without the primary table parentId

}

],

"Parent [ {// set of parent tables ]

Column [ "xxx", "xxx" ],// column set of parent table

"Field": "xxx",// associated field name of parent table

"ParentId": "xxx",// associated fields of parent and master tables

"Table" is "xxx"// father table

}]

}

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations may be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multi-level associated data heterogeneous data synchronization method is characterized by comprising the following steps of: the method comprises the following steps:

S _1,1, distinguishing the synchronous sequence according to metadata management and heterogeneous data synchronous rules;

S _1,4, circularly scrolling data;

S _1,5, inquiring corresponding data in batches, if a returned result is empty, synchronously finishing the data, otherwise, carrying out batch processing;

S _2.2, checking whether the synchronized record exists in the new database;

S ₃, performing incremental data synchronization;

S _3.2, firstly starting a message pulling task by multithreading, firstly inquiring whether a consumption record exists in the message, if so, submitting the message in a state of 'successful consumption and submitted', and directly submitting an offset in the MQ;

S _3.3, checking whether the message is processed in the message pulling task, if so, directly modifying the message state to be completed, otherwise, putting the data into binlogdatas, and waiting for asynchronous data processing;

S _3.4, through pulling the mq message, judging whether the unprocessed condition of the data caused by the service kill exists or not when the mq message is executed to half, if so, acquiring an unconsumed record, and uniformly putting binlogdatas;

S ₄, finishing data synchronization.

2. The method for heterogeneous data synchronization of multi-level associated data according to claim 1, wherein when full data synchronization is performed, when child data is earlier than parent data, temporary caching is performed according to a caching mechanism, and after the arrival of main table data, correct data results are assembled and synchronized to a target end.

3. The method of claim 1, wherein in step S _1,1, the data from the number sources is divided into a plurality of tables according to metadata management and heterogeneous data synchronization rules, including but not limited to using a table, b table and c table.

4. A multi-level associative data heterogeneous data synchronization method according to claim 1 wherein said data sources include, but are not limited to mysql, oraclesqlserver, db, postgresql, mongodb, hive, hbase, elasticsearch.

5. The method according to claim 1, wherein in step S _3.2, the retry task is pulled through a binlog message, so as to obtain an unconsumed record, and the unconsumed record is written into the memory queue.

6. The method according to claim 5, wherein in step S _3.2, if the status is "consuming successfully, committed" is not satisfied, the memory queue is also written.

7. The method for heterogeneous data synchronization of multi-level associated data according to claim 1, wherein in step S _3.4, the message is pulled uniformly, and the unconsumed record is obtained by pulling the retry task by binlog and written into the memory queue.