[go: up one dir, main page]

CN119474089A - Data replication method, device, electronic device and storage medium - Google Patents

Data replication method, device, electronic device and storage medium Download PDF

Info

Publication number
CN119474089A
CN119474089A CN202411514062.4A CN202411514062A CN119474089A CN 119474089 A CN119474089 A CN 119474089A CN 202411514062 A CN202411514062 A CN 202411514062A CN 119474089 A CN119474089 A CN 119474089A
Authority
CN
China
Prior art keywords
target
data
identifier
name
record item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411514062.4A
Other languages
Chinese (zh)
Inventor
李培林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202411514062.4A priority Critical patent/CN119474089A/en
Publication of CN119474089A publication Critical patent/CN119474089A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供了一种数据复制方法、装置、电子设备及存储介质,属于大数据技术领域。该方法包括:通过获取配置信息,配置信息包括源数据表的数据信息以及目标库的标识,数据信息包括源数据表的第一表名和表结构;对表结构按照第一规则处理,得到第一标识;将第一表名与元数据记录表中的第二表名进行匹配,得到目标记录项,元数据记录表包括多条记录项,每条记录项包括第二表名和第二标识,第二标识是对第二表名对应的表结构按照第一规则处理得到的;若目标记录项的第二标识与第一标识不同,则为源数据表创建第一目标表;将源数据表的数据复制到第一目标表中。通过将源数据表的数据复制到第一目标表中,实现源数据表和第一目标表的数据同步。

The embodiment of the present application provides a data replication method, device, electronic device and storage medium, which belongs to the field of big data technology. The method includes: obtaining configuration information, the configuration information includes data information of the source data table and the identifier of the target library, the data information includes the first table name and table structure of the source data table; processing the table structure according to the first rule to obtain the first identifier; matching the first table name with the second table name in the metadata record table to obtain the target record item, the metadata record table includes multiple record items, each record item includes the second table name and the second identifier, the second identifier is the table structure corresponding to the second table name processed according to the first rule; if the second identifier of the target record item is different from the first identifier, a first target table is created for the source data table; and the data of the source data table is copied to the first target table. By copying the data of the source data table to the first target table, data synchronization between the source data table and the first target table is achieved.

Description

Data copying method, device, electronic equipment and storage medium
Technical Field
The present application relates to the field of big data technologies, and in particular, to a data replication method, a device, an electronic apparatus, and a storage medium.
Background
Hive and impala are two common tools in the big data field, and in the common scheme using hive and impala, hive is used as a data warehouse for data storage and batch processing. Impala can be used for real-time query and interactive analysis, and faster query response time can be obtained by directly executing the query on hadoop data nodes. These two tools can work cooperatively to provide a comprehensive data analysis solution, creating impala tables and pointing data file locations to hive files is a common derivative solution, but this solution is prone to data consistency problems when multiple impala tables operate on the same hive.
Disclosure of Invention
The embodiment of the application mainly aims to provide a data replication method, a device, electronic equipment and a storage medium, which can solve the problem of inconsistent data when a plurality of impala tables operate the same hive.
To achieve the above object, a first aspect of an embodiment of the present application provides a data replication method, including:
acquiring configuration information, wherein the configuration information comprises data information of a source data table and identification of a target library, and the data information comprises a first table name and a table structure of the source data table;
processing the table structure according to a first rule to obtain a first mark;
Matching the first table name with a second table name in the metadata record table to obtain a target record item, wherein the metadata record table comprises a plurality of record items, each record item comprises the second table name and a second identifier, the second identifier is obtained by processing a table structure corresponding to the second table name according to the first rule, and the second table name in the target record item is the same as the first table name;
If the second identifier of the target record item is different from the first identifier, a first target table is created for the source data table, wherein the first target table is a table in the target library, and the source data table and the first target table have the same table structure;
copying the data of the source data table into the first target table.
In some embodiments, if the second identifier of the target record is different from the first identifier, creating a first target table for the source data table includes:
If the second identifier of the target record item is different from the first identifier and the target record item meets a first condition, creating the first target table for the source data table, and a data partition corresponding to the first target table, wherein the table structure of the source data table is the same as that of the first target table, and the first condition comprises that the target record item does not comprise the identifier of the target table in the target library;
Recording the table name of the first target table and partition information of a data partition corresponding to the first target table in the metadata record table;
The copying the data of the source data table into the first target table comprises:
And copying the data in the data partition corresponding to the source data table into the data partition corresponding to the first target table.
In some embodiments, the recording, in the metadata record table, the table name of the first target table and partition information of the data partition corresponding to the first target table includes:
If the target record item does not comprise the identification of the table in the target library, recording the table name of the first target table and partition information of a data partition corresponding to the first target table in the target record item;
And if the target record item comprises the identifier of the table in the target library, a record item is newly added in the metadata record table, wherein the newly added record item comprises the first table name, the first identifier, the table name of the first target table and partition information of a data partition corresponding to the first target table.
In some embodiments, in the event that the target entry does not include an identification of a table in the target library, the table name of the first target table is the same as the first table name;
And under the condition that the target record item comprises the identification of the table in the target library, the table name of the first target table adopts a temporary table name, and the temporary table name is determined according to the first table name and the time for creating the first target table.
In some embodiments, after the copying the data of the source data table into the first target table, the method further comprises:
Recording partition information of the first target table in a partition information table in the case that the target record item does not include an identification of the target table in the target library;
Deleting partition information of an old table in the partition information table, deleting the target record item and data stored in a data partition corresponding to the old table under the condition that the target record item comprises an identifier of the table in the target library, modifying a temporary table name in a newly added record item in the metadata record table into the first table name, and recording partition information of the first target table in the partition information table;
the partition information table comprises a plurality of information items, and each information item is used for describing the storage address and/or the stored data of the corresponding data partition of one table.
In some embodiments, the target record further includes partition information of a data partition corresponding to a second target table, where the second target table is a table in the target library;
after the matching of the first table name with the second table name in the metadata record table to obtain the target record item, the method further includes:
if the second identifier of the target record item is the same as the first identifier, partition information of a data partition corresponding to the second target table is obtained from the target record item;
And copying the data in the data partition corresponding to the source data table into the data partition corresponding to the second target table.
In some embodiments, the source data table is a hive table and the first target table is a impala table.
To achieve the above object, a second aspect of an embodiment of the present application provides a data copying apparatus, including:
The first acquisition module is used for acquiring configuration information, wherein the configuration information comprises data information of a source data table and identification of a target library, and the data information comprises a first table name and a table structure of the source data table;
the second acquisition module is used for processing the table structure according to a first rule to obtain a first identifier;
The matching module is used for matching the first table name with a second table name in the metadata record table to obtain a target record item, the metadata record table comprises a plurality of record items, each record item comprises the second table name and a second identifier, the second identifier is obtained by processing a table structure corresponding to the second table name according to the first rule, and the second table name in the target record item is the same as the first table name;
The creating module is configured to create a first target table for the source data table if the second identifier of the target record item is different from the first identifier, where the first target table is a table in the target library, and the source data table and the first target table have the same table structure;
and the data copying module is used for copying the data of the source data table into the first target table.
To achieve the above object, a third aspect of the embodiments of the present application proposes an electronic device, including a memory storing a computer program and a processor implementing the method according to the first aspect when the processor executes the computer program.
To achieve the above object, a fourth aspect of the embodiments of the present application proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of the first aspect.
The data copying method, the device, the electronic equipment and the storage medium are characterized in that configuration information is obtained, the configuration information comprises data information of a source data table and an identifier of a target library, the data information comprises a first table name and a table structure of the source data table, the table structure is processed according to a first rule to obtain the first identifier, the first table name is matched with a second table name in the metadata record table to obtain a target record item, the metadata record table comprises a plurality of record items, each record item comprises a second table name and a second identifier, the second identifier is obtained by processing a table structure corresponding to the second table name according to the first rule, the second table name in the target record item is identical to the first table name, if the second identifier of the target record item is different from the first identifier, a first target table is created for the source data table, the first target table is a table in the target library, the source data table is identical to the first table structure of the source data, and the second identifier is copied to the first data in the target library. Through the above process, when the second identifier in the target record item is different from the first identifier (namely, when the table structure of the source data table is changed), the data of the source data table is copied into the first target table, so that the data synchronization of the source data table and the first target table is realized, and when the scheme is applied to the big data field using hive and impala, the problem that the data of the hive table and impala are inconsistent can be avoided.
Drawings
FIG. 1 is a flow chart of a data replication method according to an embodiment of the present application;
FIG. 2 is a flowchart of step S104 in FIG. 1 according to an embodiment of the present application;
FIG. 3 is a flowchart of step S1042 in FIG. 2 according to an embodiment of the present application;
FIG. 4 is another flowchart of a data replication method according to an embodiment of the present application;
FIG. 5 is a flow chart of a data replication method according to an embodiment of the present application;
FIG. 6 is a flow chart of a data replication method according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a data replication device according to an embodiment of the present application;
fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
First, several nouns involved in the present application are parsed:
hive is a data warehouse tool based on Hadoop for data extraction, conversion and loading, which is a mechanism that can store, query and analyze large-scale data stored in Hadoop. The hive data warehouse tool can map a structured data file into a database table, provide SQL query functions, and convert SQL sentences into MapReduce tasks for execution. The hive has the advantages that the learning cost is low, rapid MapReduce statistics can be realized through SQL-like sentences, so that MapReduce is simpler, a special MapReduce application program does not need to be developed, and the hive is very suitable for carrying out statistical analysis on a data warehouse;
hive tables, tables in hive;
impala-impala is a massively parallel processing (MassivelyParallelProcessing, MPP) SQL query engine for processing large amounts of data stored in Hadoop clusters. impala is a new query system developed by Cloudera company, which provides SQL semantics to query big data stored in Hadoop's distributed file system (HadoopDistributedFileSystem, HDFS);
impala table-table impala.
Hive and impala are two common tools in the big data field, and in the common scheme using hive and impala, hive is used as a data warehouse for data storage and batch processing. Impala can be used for real-time query and interactive analysis, and faster query response time can be obtained by directly executing the query on hadoop data nodes. The two tools can work cooperatively to provide a comprehensive data analysis solution, an external table is created impala, and the data file position is pointed to a hive file, which is a common derivative solution, but the solution is easy to cause data consistency problem when a multi impala table operates the same hive, and when hive field information or metadata information changes, the information cannot be synchronized to impala, a impala table is often required to be manually built according to metadata changes, that is, the use mode between the existing hive table and impala table is easy to cause data inconsistency problem.
In order to solve the above problems, the embodiments of the present application provide a data replication method, apparatus, electronic device, and storage medium.
The application is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It should be noted that, in each specific embodiment of the present application, when related processing is required according to user information, user behavior data, user history data, user location information, and other data related to user identity or characteristics, permission or consent of the user is obtained first, and the collection, use, processing, and the like of the data comply with related laws and regulations and standards. In addition, when the embodiment of the application needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through popup or jump to a confirmation page and the like, and after the independent permission or independent consent of the user is definitely acquired, the necessary relevant data of the user for enabling the embodiment of the application to normally operate is acquired.
Fig. 1 is a flowchart of a data replication method according to an embodiment of the present application, where the method in fig. 1 may include, but is not limited to, steps S101 to S105, where:
Step S101, configuration information is obtained, wherein the configuration information comprises data information of a source data table and identification of a target library, and the data information comprises a first table name and a table structure of the source data table.
The table structure may include field names and field types of the source data table. The source data table may be a table in a source database, which is a different database than the target database.
In the embodiment of the present application, the source data table may be a table in hive, and the first target table may be a table in impala.
Step S102, processing the table structure according to a first rule to obtain a first identifier.
The first rule may be set according to an actual situation, and is not limited herein, for example, the first rule may be that a table structure is processed by using a Message digest algorithm (Message-DigestAlgorithm, MD), and a hash value is obtained after the processing, where the hash value is called a first identifier, specifically, each field name and field type of the table structure may be spliced to obtain a spliced character string, and the spliced character string is processed by using MD5 to obtain the first identifier.
The first rule may also be a table structure, and partition information of a data partition of the source data table is processed by using MD5, where the data partition of the source data table refers to a storage area for storing data of the source data table, and the partition information is used to describe a storage address and/or stored data of the data partition. For example, each field name, field type and partition information of the table structure are spliced to obtain a spliced character string, and the spliced character string is processed by adopting MD5 to obtain a first identifier.
In the foregoing, MD5 is a widely used cryptographic hash function that can generate a 128-bit (16-byte) hash value (also called a hash value) to ensure that the information is transferred completely and consistently, and in this embodiment, MD5 may be replaced by another algorithm for generating a hash value.
It should be noted that, the first rule needs to be processed at least based on the table structure of the source data table, so that it is convenient to determine whether the table structure of the source data table changes by comparing the generated hash values (i.e. the first identifier).
Step S103, matching the first table name with a second table name in the metadata record table to obtain a target record item, wherein the metadata record table comprises a plurality of record items, each record item comprises the second table name and a second identifier, the second identifier is obtained by processing a table structure corresponding to the second table name according to the first rule, and the second table name in the target record item is identical to the first table name.
For convenience of description, the table name in the record item is referred to as a second table name, and the identifier obtained by processing according to the first rule according to the table structure corresponding to the second table name is referred to as a second identifier. It should be noted that an entry may also include an identification of a table in the target library (e.g., the identification may be a table name, which may be the same as a second table name in the entry), but not every entry includes an identification of a table in the target library. For example, if the hive table has a corresponding impala table, the entry may include the table name of the hive table, the second identifier, the table name of the impala table, and partition information of the data partition of the impala table.
And searching in the metadata record table according to the first table name, and finding out a record item where a second table name identical to the first table name is located, wherein the record item is called a target record item.
Step S104, if the second identifier of the target record item is different from the first identifier, a first target table is created for the source data table, wherein the first target table is a table in the target library, and the source data table and the first target table have the same table structure.
After the target record item is found, the second identifier in the target record item is matched with the first identifier, if the second identifier and the first identifier are the same, the table structure of the source data table is identical to the record in the metadata record table, the table structure of the source data table is unchanged, if the second identifier and the first identifier are different, the table structure of the source data table is different from the record in the metadata record table, the table structure of the source data table is changed, a first target table needs to be created for the source data table, for example, the source data table is a hive table, and if the second identifier in the target record item is different from the first identifier, a impala table (namely the first target table) is created for the hive table, and the table structure of the created impala table is identical to the table structure of the hive table and comprises the same field name and field type.
Step S105, copying the data of the source data table to the first target table.
The source data table and the first target table are respectively based on different HDFS, so that data coupling of the source data table and the first target table can be avoided. The data of the source data table may be business data, transaction data, payment data, etc. in the financial field, or may be data in other fields, which is not limited herein.
Specifically, the data of the source data table is stored in the data partition of the source data table, the first target table also has a corresponding data partition, and when the data is copied, the data in the data partition of the source data table is copied into the data partition of the first target table, so that the data copy of the source data table is realized.
The method comprises the steps of S101 to S105, wherein configuration information is obtained, the configuration information comprises data information of a source data table and an identifier of a target library, the data information comprises a first table name and a table structure of the source data table, the table structure is processed according to a first rule to obtain a first identifier, the first table name is matched with a second table name in the metadata record table to obtain a target record item, the metadata record table comprises a plurality of record items, each record item comprises a second table name and a second identifier, the second identifier is obtained by processing a table structure corresponding to the second table name according to the first rule, the second table name in the target record item is identical to the first table name, if the second identifier of the target record item is different from the first identifier, a first target table is created for the source data table, the first target table is a table in the target library, the source data table is identical to the first target table, and the source data table is copied to the first data table in the target structure. Through the above process, when the second identifier in the target record item is different from the first identifier (namely, when the table structure of the source data table is changed), the data of the source data table is copied into the first target table, so that the data synchronization of the source data table and the first target table is realized, and when the scheme is applied to the big data field using hive and impala, the problem that the data of the hive table and impala are inconsistent can be avoided.
Referring to fig. 2, in some embodiments, step S104 may include, but is not limited to, steps S1041 to S1042:
Step S1041, when the second identifier of the target record is different from the first identifier and the target record meets a first condition, creating the first target table for the source data table, and a data partition corresponding to the first target table, where the first condition includes that the target record does not include the identifier of the table in the target library, or that the target record includes the identifier of the table in the target library, and the table structure of the source data table is the same as the table structure of the first target table.
If the corresponding target table is created for the first table name corresponding to the source data table, the created identifier of the target table (which may also be a table name) is recorded in the target record item, in which case the target record item includes the identifier of the target table in the target library, and if the corresponding target table is not created for the first table name corresponding to the source data table, the target table corresponding to the first table name is not recorded in the target record item, in which case the target record item does not include the identifier of the target table in the target library.
When the first target table is created, the table structure of the first target table is created according to the table structure of the source data table, and the table name of the first target table can be set to be the first table name, so that a user can use the source data table and the first target table which are located in different databases according to the same table name.
The data partition corresponding to the first target table is used for storing the data of the first target table.
In step S1042, the table name of the first target table and the partition information of the data partition corresponding to the first target table are recorded in the metadata record table.
The partition information of the data partition corresponding to the first target table is used for describing the storage address and/or the stored data of the data partition. The partition information of the data partition corresponding to the first target table may include an identifier of the data partition, and a storage address of the data partition may be determined according to the identifier.
Accordingly, step S105, the copying the data of the source data table to the first target table includes:
step S1051, copying the data in the data partition corresponding to the source data table to the data partition corresponding to the first target table.
In the step S1041 to step S1043 illustrated in this embodiment, when the second identifier of the target record item is different from the first identifier and the target record item satisfies the first condition, it is described that the table structure of the source data table is changed, and no matter whether the corresponding table is created for the source data table or not, in this case, a first target table needs to be created for the source data table, where the first target table is a table corresponding to the source data table in the target library, and then data in a data partition corresponding to the source data table is copied to a data partition corresponding to the first target table, so that the data synchronization of the source data table and the first target table is implemented, and when the scheme is applied to the large data field using hive and impala, the problem that the data of the hive table and impala table are inconsistent can be avoided.
Referring to fig. 3, in some embodiments, step S1042 includes recording, in the metadata record table, a table name of the first target table and partition information of a data partition corresponding to the first target table, which may include, but is not limited to, steps S10421 to S10422:
In step S10421, if the target record item does not include the identifier of the table in the target library, the table name of the first target table and the partition information of the data partition corresponding to the first target table are recorded in the target record item.
The target record item does not include the identifier of the table in the target library, which indicates that the corresponding table has not been created for the source data table, in this case, after the corresponding first target table is created for the source data table, the table name of the first target table and the partition information of the data partition corresponding to the first target table are recorded in the target record item, and then when the target record item is found in the metadata record table according to the first table name again, the history record of the first target table created for the first table name can be known, which is favorable for data consistency.
In step S10422, if the target record includes the identifier of the table in the target library, a record is newly added in the metadata record, where the newly added record includes the first table name, the first identifier, the table name of the first target table, and partition information of the data partition corresponding to the first target table.
The target record item comprises an identifier of a table in the target library, which indicates that a corresponding table is created for the source data table, in this case, because the table structure of the source data table is changed, the previously created table cannot be used any more, a corresponding first target table needs to be created for the source data table, then, a record item is newly added in the metadata record table, and the newly added record item comprises a first table name, the first identifier, the table name of the first target table and partition information of a data partition corresponding to the first target table. And later searching in the metadata record table according to the first table name again, the history record of the first target table created for the first table name can be obtained, and the data consistency is facilitated.
In some embodiments, in the event that the target entry does not include an identification of a table in the target library, the table name of the first target table is the same as the first table name;
And under the condition that the target record item comprises the identification of the table in the target library, the table name of the first target table adopts a temporary table name, and the temporary table name is determined according to the first table name and the time for creating the first target table.
In the foregoing, the target record item does not include an identifier of the table in the target library, which indicates that the table in the corresponding target library has not been created for the first table name previously, in this case, the first target table is directly created, and the table name of the first target table is set to be the same as the first table name, so that the user may use the source data table and the first target table located in different databases according to the same table name.
The target record includes an identifier of the target table, which indicates that the table in the corresponding target table is created for the first table name, in this case, because the table structure of the source data table is changed, the previously created table cannot be used any more, the corresponding table needs to be created again for the source data table, in order to be distinguished from the previously created table (the table name of the previously created table is the same as the first table name), the table name of the first target table created here adopts a temporary table name, which is determined according to the first table name and the time of creating the first target table, for example, the first table name is "user basic table", and the time of creating the first target table is "8 points 12 minutes 56 seconds", and the time of creating the first target table is spliced to obtain the temporary table name, i.e., "user basic table+8 points 12 minutes 56 seconds". The temporary table name may be subsequently modified to the first table name after deleting the target entry (i.e., deleting information about the previously created table in the metadata record table), so that the user may use the source data table and the first target table located in different databases according to the same table name.
Referring to fig. 4, after copying the data of the source data table into the first target table in step S105, the method further includes step S106 or step S106':
step S106, when the target record item does not include the identification of the target table in the target library, the partition information of the first target table is recorded in a partition information table;
Step S106', in the case that the target entry includes the identifier of the table in the target library, deleting the partition information of the old table in the partition information table, where the old table is a table in the target library included in the target entry, deleting the target entry (the table is a corresponding table created last for the first table name, and when deleting, may be deleting a definition of the table, for example, a table name, a table structure, etc.) and data stored in a data partition corresponding to the old table, and modifying a temporary table name in the newly added entry in the metadata record table to the first table name, and recording partition information of the first target table in the partition information table;
the partition information table comprises a plurality of information items, and each information item is used for describing the storage address and/or the stored data of the corresponding data partition of one table.
Wherein each information item may include a unique identification (e.g., the unique identification may be a database name+a table name where the table is located) and partition information of the table, each partition information being used to record a storage address of each data partition of the corresponding table, and/or information of data stored at each storage address, such as a data size, a storage time, and the like.
Partition information of the active data table is also recorded in the partition information table, and data of the active data table can be obtained according to the partition information.
In the foregoing, the partition information of the first target table is recorded in the partition information table, and then when the data of the first target table is acquired, the data storage address of the first target table may be determined according to the record in the partition information table, so that the data is taken out according to the data storage address.
Referring to fig. 5, in some embodiments of the present application, the target record further includes partition information of a data partition corresponding to a second target table, where the second target table is a table in the target library, and accordingly, in step S103, after matching the first table name with the second table name in the metadata record table to obtain the target record, the method further includes step S107 and step S108:
Step S107, if the second identifier of the target record item is the same as the first identifier, obtaining partition information of a data partition corresponding to the second target table from the target record item;
And S108, copying the data in the data partition corresponding to the source data table into the data partition corresponding to the second target table.
If the second identifier of the target record item is the same as the first identifier, it is indicated that the table structure of the source data table is not changed, and the data copying is directly performed, and the data partition corresponding to the source data table may be determined according to the partition information table, so as to take out the data, and the partition information of the data partition corresponding to the second target table is obtained from the target record item, and then the data of the source data table is copied into the data partition corresponding to the second target table, so as to realize the data synchronization of the source data table and the second target table.
Taking the derivative of the hive table to the impala table as an example, the data replication method provided by the embodiment of the present application is illustrated, as shown in fig. 6 which is a flowchart of the data replication method provided by the embodiment of the present application,
(1) A spark execution environment is created, the configuration information is read, and whether the configuration information is correct or not is checked, for example, whether a data source is hive or not, and whether a written target library is impala or not.
(2) And querying the table structure of the hive table, analyzing all field names, field types and partition information, and calculating to obtain an MD5 value (namely a first identifier) after splicing and sequencing the field names, the field types and the partition information. Querying an MD5 value (i.e., a second identifier) corresponding to a table name of the hive table from a metadata record table (table_meta_info), and judging whether the two MD5 values are identical or not:
1) If the two tables are consistent, the table structure of the hive table is unchanged, and the existing impala table does not need to be adjusted;
2) If the two tables are inconsistent and the hive table has no corresponding impala table, writing the current table metadata (for example, the table name of the hive table) into a metadata record table, creating a impala table and a data partition corresponding to the hive table, and setting a metadata change mark update_flag=true;
3) If the two tables are inconsistent and a impala table corresponding to the hive table exists, the table structure of the hive table is changed, a impala temporary table and a data partition corresponding to the hive are created (the table name of the temporary table adopts the naming mode of the hive table plus the time suffix), and a metadata change mark update_flag=true is set;
(3) Copying the partition file of the hive table to a partition directory corresponding to the impala table HDFS;
(4) Judging whether the metadata is updated according to the update_flag, wherein the method specifically comprises the following steps:
1) If metadata is changed (update_flag=true), deleting historical partition synchronization records of hives (i.e. partition information of old tables) in a partition information table (table_parts_info), deleting impala the old tables and data files by using a drop command, then using a alterrename command to change the table name of a impala temporary table to a formal table name (i.e. a first table name), recording partition information corresponding to a impala table in the partition information table, refreshing the metadata of a impala table (i.e. updating records in the metadata record table, and modifying temporary table names recorded in the metadata record table to the first table name).
2) If the metadata has no change (update_flag=false), the partition information table (table_parts_info) is queried to determine whether the record of the current partition of the current synchronization table (i.e. the partition information of impala tables) exists, if so, the metadata is not changed and the partition information of the table exists, and no operation is needed, if not, the partition information corresponding to impala tables is updated, and the metadata of impala tables is refreshed.
In addition, the scheme automatically detects metadata change information of the hive table (namely, detects whether the table structure and partition information of the hive table change) when the derivative is each time, and under the condition of change, the corresponding impala table is re-created by adopting the change information without manual participation, thereby improving the processing efficiency of data consistency while ensuring the data consistency.
Referring to fig. 7, an embodiment of the present application further provides a data replication device, which can implement the above data replication method, where the data replication device 900 includes:
a first obtaining module 901, configured to obtain configuration information, where the configuration information includes data information of a source data table and an identifier of a target library, and the data information includes a first table name and a table structure of the source data table;
a second obtaining module 902, configured to process the table structure according to a first rule to obtain a first identifier;
the matching module 903 is configured to match the first table name with a second table name in the metadata record table to obtain a target record item, where the metadata record table includes a plurality of record items, each record item includes a second table name and a second identifier, the second identifier is obtained by processing a table structure corresponding to the second table name according to the first rule, and the second table name in the target record item is the same as the first table name;
a creating module 904, configured to create a first target table for the source data table if the second identifier of the target record item is different from the first identifier, where the first target table is a table in the target library, and the source data table and the first target table have the same table structure;
a first data copying module 905, configured to copy the data of the source data table into the first target table.
In some embodiments, the creation module 904 includes:
The creating sub-module is used for creating the first target table for the source data table and a data partition corresponding to the first target table when the second identifier of the target record item is different from the first identifier and the target record item meets a first condition, wherein the first condition comprises that the target record item does not comprise the identifier of the table in the target library or that the target record item comprises the identifier of the table in the target library, and the table structure of the source data table is the same as that of the first target table;
A recording sub-module, configured to record, in the metadata record table, a table name of the first target table and partition information of a data partition corresponding to the first target table;
the first data copying module 905 is specifically configured to copy data in a data partition corresponding to the source data table to a data partition corresponding to the first target table.
In some embodiments, the recording sub-module includes:
The first recording unit is used for recording the table name of the first target table and the partition information of the data partition corresponding to the first target table in the target record item if the target record item does not comprise the identification of the table in the target library;
And the second recording unit is used for adding a record item in the metadata record table if the target record item comprises the identifier of the table in the target library, wherein the added record item comprises the first table name, the first identifier, the table name of the first target table and partition information of a data partition corresponding to the first target table.
In some embodiments, in the event that the target entry does not include an identification of a table in the target library, the table name of the first target table is the same as the first table name;
And under the condition that the target record item comprises the identification of the table in the target library, the table name of the first target table adopts a temporary table name, and the temporary table name is determined according to the first table name and the time for creating the first target table.
In some embodiments, the data replication device 900 further comprises:
the first processing module is used for recording partition information of the first target table in a partition information table under the condition that the target record item does not comprise the identification of the target table in the target library;
The second processing module is configured to delete partition information of an old table in the partition information table, delete data stored in a data partition corresponding to the old table and the target record, change a temporary table name in a newly added record in the metadata record into the first table name, and record partition information of the first target table in the partition information table, where the old table is a table in the target library included in the target record;
the partition information table comprises a plurality of information items, and each information item is used for describing the storage address and/or the stored data of the corresponding data partition of one table.
In some embodiments, the target record further includes partition information of a data partition corresponding to a second target table, where the second target table is a table in the target library;
The data replication device 900 further includes:
The third acquisition module is used for acquiring partition information of a data partition corresponding to the second target table from the target record item if the second identifier of the target record item is the same as the first identifier;
And the second data copying module is used for copying the data in the data partition corresponding to the source data table into the data partition corresponding to the second target table.
In some embodiments, the source data table is a hive table and the first target table is a impala table.
The data replication device 900 can implement the data replication method and achieve the same technical effects, and the description in the embodiment of the data replication method may be specifically referred to, which is not described herein again.
The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the data copying method when executing the computer program. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.
Referring to fig. 8, fig. 8 is a schematic hardware structure of an electronic device according to an embodiment of the present application, where the electronic device includes:
The processor 1001 may be implemented by a general purpose central processing unit (CentralProcessingUnit, CPU), a microprocessor, an application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application;
Memory 1002 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM). The memory 1002 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present disclosure are implemented by software or firmware, relevant program codes are stored in the memory 1002, and the processor 1001 invokes a data replication method for executing the embodiments of the present disclosure;
an input/output interface 1003 for implementing information input and output;
the communication interface 1004 is configured to implement communication interaction between the present device and other devices, and may implement communication in a wired manner (such as USB, network cable, etc.), or may implement communication in a wireless manner (such as mobile network, WI F I, bluetooth, etc.);
a bus 1005 for transferring information between the various components of the device (e.g., the processor 1001, memory 1002, input/output interface 1003, and communication interface 1004);
Wherein the processor 1001, the memory 1002, the input/output interface 1003, and the communication interface 1004 realize communication connection between each other inside the device through the bus 1005.
The embodiment of the application also provides a computer readable storage medium, which stores a computer program, and the computer program realizes the data copying method when being executed by a processor.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The present application also provides a computer program product comprising computer programs/instructions which when executed by one or more processors implement the steps of the data replication method of any of the embodiments of the present application.
The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
It will be appreciated by persons skilled in the art that the embodiments of the application are not limited by the illustrations, and that more or fewer steps than those shown may be included, or certain steps may be combined, or different steps may be included.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" is used to describe an association relationship of an associated object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that only a exists, only B exists, and three cases of a and B exist simultaneously, where a and B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one of a, b or c may represent a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. The storage medium includes various media capable of storing programs, such as a USB flash disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims (10)

1.一种数据复制方法,其特征在于,所述方法包括:1. A data replication method, characterized in that the method comprises: 获取配置信息,所述配置信息包括源数据表的数据信息以及目标库的标识,所述数据信息包括所述源数据表的第一表名和表结构;Acquire configuration information, the configuration information including data information of a source data table and an identifier of a target database, the data information including a first table name and a table structure of the source data table; 对所述表结构按照第一规则处理,得到第一标识;Processing the table structure according to a first rule to obtain a first identifier; 将所述第一表名与元数据记录表中的第二表名进行匹配,得到目标记录项,所述元数据记录表包括多条记录项,每条记录项包括第二表名和第二标识,所述第二标识是对所述第二表名对应的表结构按照所述第一规则处理得到的,所述目标记录项中的第二表名与所述第一表名相同;Matching the first table name with a second table name in a metadata record table to obtain a target record item, wherein the metadata record table includes multiple record items, each record item includes a second table name and a second identifier, the second identifier is obtained by processing a table structure corresponding to the second table name according to the first rule, and the second table name in the target record item is the same as the first table name; 若所述目标记录项的第二标识与所述第一标识不同,则为所述源数据表创建第一目标表,所述第一目标表为所述目标库中的表,所述源数据表与所述第一目标表具有相同的表结构;If the second identifier of the target record item is different from the first identifier, a first target table is created for the source data table, the first target table is a table in the target database, and the source data table and the first target table have the same table structure; 将所述源数据表的数据复制到所述第一目标表中。The data of the source data table is copied to the first target table. 2.根据权利要求1所述的数据复制方法,其特征在于,所述若所述目标记录项的第二标识与所述第一标识不同,则为所述源数据表创建第一目标表,包括:2. The data replication method according to claim 1, wherein if the second identifier of the target record item is different from the first identifier, creating a first target table for the source data table comprises: 在所述目标记录项的第二标识与所述第一标识不同,且所述目标记录项满足第一条件的情况下,为所述源数据表创建所述第一目标表,以及所述第一目标表对应的数据分区,所述源数据表的表结构与所述第一目标表的表结构相同,其中,所述第一条件包括:所述目标记录项不包括所述目标库中表的标识;或者,所述目标记录项包括所述目标库中表的标识;When the second identifier of the target record item is different from the first identifier and the target record item satisfies a first condition, the first target table and a data partition corresponding to the first target table are created for the source data table, and the table structure of the source data table is the same as the table structure of the first target table, wherein the first condition includes: the target record item does not include the identifier of the table in the target library; or the target record item includes the identifier of the table in the target library; 在所述元数据记录表中记录所述第一目标表的表名以及所述第一目标表对应的数据分区的分区信息;Recording the table name of the first target table and the partition information of the data partition corresponding to the first target table in the metadata record table; 所述将所述源数据表的数据复制到所述第一目标表中,包括:The step of copying the data in the source data table to the first target table includes: 将所述源数据表对应的数据分区中的数据复制到所述第一目标表对应的数据分区中。The data in the data partition corresponding to the source data table is copied to the data partition corresponding to the first target table. 3.根据权利要求2所述的数据复制方法,其特征在于,所述在所述元数据记录表中记录所述第一目标表的表名以及所述第一目标表对应的数据分区的分区信息,包括:3. The data replication method according to claim 2, characterized in that recording the table name of the first target table and the partition information of the data partition corresponding to the first target table in the metadata record table comprises: 若所述目标记录项不包括所述目标库中表的标识,则在所述目标记录项中记录所述第一目标表的表名以及所述第一目标表对应的数据分区的分区信息;If the target record item does not include the identifier of the table in the target library, recording the table name of the first target table and the partition information of the data partition corresponding to the first target table in the target record item; 若所述目标记录项包括所述目标库中表的标识,则在所述元数据记录表中新增一记录项,新增的记录项包括所述第一表名、第一标识、所述第一目标表的表名以及所述第一目标表对应的数据分区的分区信息。If the target record item includes the identifier of the table in the target library, a new record item is added to the metadata record table, and the newly added record item includes the first table name, the first identifier, the table name of the first target table, and the partition information of the data partition corresponding to the first target table. 4.根据权利要求3所述的数据复制方法,其特征在于,在所述目标记录项不包括所述目标库中表的标识的情况下,所述第一目标表的表名与所述第一表名相同;4. The data replication method according to claim 3, characterized in that, when the target record item does not include the identifier of the table in the target library, the table name of the first target table is the same as the first table name; 在所述目标记录项包括所述目标库中表的标识的情况下,所述第一目标表的表名采用临时表名,所述临时表名根据所述第一表名和创建所述第一目标表的时间确定。In the case where the target record item includes an identifier of a table in the target library, the table name of the first target table adopts a temporary table name, and the temporary table name is determined according to the first table name and the time when the first target table is created. 5.根据权利要求4所述的数据复制方法,其特征在于,在所述将所述源数据表的数据复制到所述第一目标表中之后,所述方法还包括:5. The data replication method according to claim 4, characterized in that after copying the data of the source data table to the first target table, the method further comprises: 在所述目标记录项不包括所述目标库中表的标识的情况下,在分区信息表中记录所述第一目标表的分区信息;In a case where the target record item does not include an identifier of a table in the target library, recording partition information of the first target table in a partition information table; 在所述目标记录项包括所述目标库中表的标识的情况下,删除所述分区信息表中旧表的分区信息,并删除所述目标记录项以及所述旧表对应的数据分区中存储的数据,将所述元数据记录表中新增的记录项中的临时表名修改为所述第一表名,在所述分区信息表中记录所述第一目标表的分区信息,所述旧表为所述目标记录项包括的所述目标库中的表;In the case where the target record item includes an identifier of a table in the target library, the partition information of the old table in the partition information table is deleted, and the data stored in the data partition corresponding to the target record item and the old table is deleted, the temporary table name in the newly added record item in the metadata record table is modified to the first table name, and the partition information of the first target table is recorded in the partition information table, where the old table is the table in the target library included in the target record item; 其中,所述分区信息表包括多个信息项,每个所述信息项用于对一个表对应的数据分区的存储地址和/或存储的数据进行描述。The partition information table includes a plurality of information items, each of which is used to describe the storage address and/or stored data of a data partition corresponding to a table. 6.根据权利要求1所述的数据复制方法,其特征在于,所述目标记录项还包括第二目标表对应的数据分区的分区信息,所述第二目标表为所述目标库中的表;6. The data replication method according to claim 1, characterized in that the target record item further includes partition information of a data partition corresponding to a second target table, and the second target table is a table in the target library; 在所述将所述第一表名与所述元数据记录表中的第二表名进行匹配,得到目标记录项之后,所述方法还包括:After matching the first table name with the second table name in the metadata record table to obtain the target record item, the method further includes: 若所述目标记录项的第二标识与所述第一标识相同,则从所述目标记录项中获取所述第二目标表对应的数据分区的分区信息;If the second identifier of the target record item is the same as the first identifier, obtaining partition information of the data partition corresponding to the second target table from the target record item; 将所述源数据表对应的数据分区中的数据复制到所述第二目标表对应的数据分区中。The data in the data partition corresponding to the source data table is copied to the data partition corresponding to the second target table. 7.根据权利要求1-6中任一项所述的数据复制方法,其特征在于,所述源数据表为hive表,所述第一目标表为impala表。7. The data replication method according to any one of claims 1 to 6, characterized in that the source data table is a hive table, and the first target table is an impala table. 8.一种数据复制装置,其特征在于,所述装置包括:8. A data replication device, characterized in that the device comprises: 第一获取模块,用于获取配置信息,所述配置信息包括源数据表的数据信息以及目标库的标识,所述数据信息包括所述源数据表的第一表名和表结构;A first acquisition module is used to acquire configuration information, wherein the configuration information includes data information of a source data table and an identifier of a target database, wherein the data information includes a first table name and a table structure of the source data table; 第二获取模块,用于对所述表结构按照第一规则处理,得到第一标识;A second acquisition module, configured to process the table structure according to a first rule to obtain a first identifier; 匹配模块,用于将所述第一表名与元数据记录表中的第二表名进行匹配,得到目标记录项,所述元数据记录表包括多条记录项,每条记录项包括第二表名和第二标识,所述第二标识是对所述第二表名对应的表结构按照所述第一规则处理得到的,所述目标记录项中的第二表名与所述第一表名相同;a matching module, configured to match the first table name with a second table name in a metadata record table to obtain a target record item, wherein the metadata record table includes a plurality of record items, each record item includes a second table name and a second identifier, the second identifier is obtained by processing a table structure corresponding to the second table name according to the first rule, and the second table name in the target record item is the same as the first table name; 创建模块,用于若所述目标记录项的第二标识与所述第一标识不同,则为所述源数据表创建第一目标表,所述第一目标表为所述目标库中的表,所述源数据表与所述第一目标表具有相同的表结构;a creation module, configured to create a first target table for the source data table if the second identifier of the target record item is different from the first identifier, wherein the first target table is a table in the target library, and the source data table and the first target table have the same table structure; 数据复制模块,用于将所述源数据表的数据复制到所述第一目标表中。A data replication module is used to replicate the data in the source data table to the first target table. 9.一种电子设备,其特征在于,所述电子设备包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述的数据复制方法。9. An electronic device, characterized in that the electronic device comprises a memory and a processor, the memory stores a computer program, and the processor implements the data copying method according to any one of claims 1 to 7 when executing the computer program. 10.一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的数据复制方法。10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the data replication method according to any one of claims 1 to 7.
CN202411514062.4A 2024-10-28 2024-10-28 Data replication method, device, electronic device and storage medium Pending CN119474089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411514062.4A CN119474089A (en) 2024-10-28 2024-10-28 Data replication method, device, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411514062.4A CN119474089A (en) 2024-10-28 2024-10-28 Data replication method, device, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN119474089A true CN119474089A (en) 2025-02-18

Family

ID=94574493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411514062.4A Pending CN119474089A (en) 2024-10-28 2024-10-28 Data replication method, device, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN119474089A (en)

Similar Documents

Publication Publication Date Title
US20250053597A1 (en) System and methods for metadata management in content addressable storage
US11475034B2 (en) Schemaless to relational representation conversion
CN109997126B (en) Event driven extraction, transformation, and loading (ETL) processing
US8924365B2 (en) System and method for range search over distributive storage systems
CN111324610A (en) Data synchronization method and device
CN111046036A (en) Data synchronization method, device, system and storage medium
CN109739828B (en) A data processing method, device and computer-readable storage medium
CN109669925B (en) Management method and device of unstructured data
CN115840786B (en) Data lake data synchronization method and device
CN114661823A (en) Data synchronization method, apparatus, electronic device and readable storage medium
US11693834B2 (en) Model generation service for data retrieval
CN114625751B (en) Data traceability query method and device based on blockchain
CN113127549A (en) Incremental data synchronization method and device, computer equipment and storage medium
Ibáñez et al. Col-graph: Towards writable and scalable linked open data
US10956386B2 (en) Methods and apparatuses for automated performance tuning of a data modeling platform
CN113760600B (en) Database backup method, database restoration method and related devices
US11023449B2 (en) Method and system to search logs that contain a massive number of entries
CN108256019A (en) Database key generation method, device, equipment and its storage medium
US20200311067A1 (en) Database partition pruning using dependency graph
CN119474089A (en) Data replication method, device, electronic device and storage medium
CN113553320B (en) Data quality monitoring method and device
CN114647630A (en) File synchronization method, information generation method, file synchronization device, information generation device, computer equipment and storage medium
CN110704421A (en) Data processing method, device, equipment and computer readable storage medium
US11086839B2 (en) Bijective transformation for compression of GUID
CN114168566A (en) Data processing method, device, equipment, medium and program product for item data synchronization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination