CN114036129A

CN114036129A - Database switching method for reducing data loss

Info

Publication number: CN114036129A
Application number: CN202111315451.0A
Authority: CN
Inventors: 刘小龙; 唐军; 周文龙
Original assignee: Sichuan Cric Technology Co ltd
Current assignee: Sichuan Cric Technology Co ltd
Priority date: 2021-11-08
Filing date: 2021-11-08
Publication date: 2022-02-11

Abstract

The invention discloses a database switching method for reducing data loss, which is characterized in that by combining the characteristics of semi-synchronous replication and MHA cluster schemes, in a master-slave architecture, a replicated cluster is required to have at least three database servers, and if only one slave node receives the latest binary log, the latest binary log can be applied to all other slave servers. On the basis, a third-party log server is introduced, and when the main library transaction is submitted, the logs of the local and third-party log servers are written simultaneously. Even if the semi-synchronous component fails and the main database server cannot access the semi-synchronous component, the logs can still be obtained from the third-party log server, so that the risk of data loss caused by the missing of the binary logs in the whole database switching process is avoided.

Description

Database switching method for reducing data loss

Technical Field

The invention relates to the technical field of databases, in particular to a database switching method for reducing data loss.

Background

The available architecture is basically standard for internet services, and both application services and database services need to be highly available. For a system, many modules may be included, such as front-end applications, caches, databases, message queues, etc., and each module needs to be highly available to ensure high availability of the whole system. For database services, high availability may be more complicated, and services available to users, not only access, but also need to have correctness guarantee, so when discussing a high availability scheme for a database, data consistency problems in the scheme are generally considered at the same time.

In the current high-availability architecture of the MySQL database, semi-synchronous replication is often used to ensure the data consistency of database switching. In a master-slave architecture, when a master library commits a transaction, the commit needs to be completed after receiving feedback from the slave library written in the log. But in the data synchronization process, semi-synchronization may degrade to asynchronous replication due to network problems, large transaction problems, etc. In the asynchronous replication, the transaction submitted by the master library is only guaranteed to be written into the log of the master server, and the real-time switching between the master and the slave can cause that the slave library does not have the latest data with the master library. Data consistency cannot be guaranteed if it relies entirely on asynchronous replication or semi-synchronous replication components. In many MHA cluster solutions, trying to save binary logs from a down main server during automatic failover ensures that data is not lost to the greatest extent, but this is not always feasible. If the main server hardware fails or cannot be accessed through ssh, the MHA cannot save the binary log, and only failover is performed, and the latest data is lost.

Disclosure of Invention

The present invention provides a database switching method for reducing data loss to solve the above problems, and combines the features of semi-synchronous replication and MHA clustering schemes, and in a master-slave architecture, requires that there must be at least three database servers in a replication cluster, and if only one slave node has received the latest binary log, the latest binary log can be applied to all other slave servers. On the basis, a third-party log server is introduced, and when the main library transaction is submitted, the logs of the local and third-party log servers are written simultaneously. Even if the semi-synchronous component fails and the main database server cannot access the semi-synchronous component, the logs can still be obtained from the third-party log server, so that the risk of data loss caused by the missing of the binary logs in the whole database switching process is avoided.

The invention realizes the purpose through the following technical scheme:

a database switching method for reducing data loss comprises the following steps:

step 1, a multi-node database cluster based on semi-synchronous replication is established, and a synchronous timeout mechanism is added;

step 2, accessing a binary log server, and storing incremental data generated by a main database in real time;

and 3, creating a third-party management node to monitor the cluster state, and deploying a database switching script.

In the step 1, a database cluster with at least three nodes is created and configured as a semi-synchronous replication mode, and a certain timeout time is set for synchronization, i.e. the database cluster is stepped back to an asynchronous replication mode.

In the step 1, if the transaction of the client in the master library is always in a waiting state due to the semi-synchronous replication, a synchronous timeout mechanism needs to be added, if the waiting time exceeds the configured timeout time in the waiting process, and no slave node notifies the current transaction, the master library is converted into asynchronous replication, and when at least one semi-synchronous slave node catches up, the master library is converted into semi-synchronous replication.

Further, in the step 2, the binary log of the database records all statements of the data modification operation, and records the statements in the form of events.

A further scheme is that in the step 2, a binary log server is accessed, a binary log on an appointed database instance is pulled to be stored locally in real time, a daemon process is started on the log server, and an IO thread simulated as a slave library is connected with a master library; after receiving the request, the master library creates a dump thread and pushes the binary logs of the master library to other slave library nodes and a log server in the cluster respectively; and directly restoring the latest data from the binary log of the slave library by using the SQL thread, and storing the binary log to the log server for incremental data restoration in the database switching process.

In step 3, heartbeat detection is performed on the cluster by using a third-party management node, and the states of all database cluster nodes are regularly monitored, including whether each server node can be connected, whether each database instance can be connected, and whether the master-slave replication state is normal.

In the step 3, a script for database fault switching is deployed on the management node, when the main library fails, the script is firstly connected to the main library server to store the binary log to the management node, and if the main library fails to be connected, the script is used for obtaining the log of the incremental data from the binary log server; then judging whether the semi-synchronous replication of the cluster nodes is normal or not, directly recovering the local log in a semi-synchronous mode, and recovering the binary log stored in the management node in an asynchronous mode; and finally, the slave library recovered to the latest data is promoted to be the master library, and the database switching process is completed.

The invention has the beneficial effects that:

according to the database switching method for reducing data loss, the third-party management node monitors the semi-synchronous replication cluster of the database, and then the semi-synchronous replication cluster is accessed to the independent binary log server to store incremental data, so that the risk of data loss in the MySQL database fault switching process is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following briefly introduces the embodiments or the drawings needed to be practical in the prior art description, and obviously, the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is an architecture diagram of the nodes of a cluster.

FIG. 2 is a flow diagram of managing node data recovery.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.

In any embodiment, as shown in fig. 1-2, a database switching method for reducing data loss according to the present invention includes the following steps:

creating a database cluster with at least three nodes, configuring the database cluster into a semi-synchronous replication mode, and setting a certain timeout time for synchronization, wherein the timeout time is stepped back to the asynchronous replication mode. MySQL is asynchronous by default, and the master library returns the result to the client immediately after executing the transaction submitted by the client, and does not care whether the slave library has been received and processed. This has the problem that if the master is hung up, the transaction that has been committed on the master may not be passed to the slave, and at this point a database switch may result in incomplete data on the new master. And by using semi-synchronous replication, the master library is not returned to the client immediately after the transaction submitted by the client is executed, but is returned to the client after at least one slave library is received and written into a local log, so that the safety of data synchronization is improved.

But the use of semi-synchronous replication in some specific scenarios may result in the client's transactions at the home base being always waiting because of the need to add a synchronization timeout mechanism, possibly for network reasons or for the impact of large transactions. If the waiting time exceeds the configured timeout time in the waiting process, no slave node informs the current transaction, the master library is converted into asynchronous replication, and when at least one semi-synchronous slave node catches up, the master library is converted into semi-synchronous replication.

the binary log of the database records all statements of data modification operation, records the statements in the form of events, and can restore the database to any time point by the aid of the complete binary log. In the cluster node, data synchronization is also realized through synchronization of the binary logs. Although the semi-synchronous replication mode is adopted, the condition is degraded into asynchronous replication in the period of cluster operation. There is still a risk of data loss when a database failover occurs under asynchronous replication conditions.

And when the binary log server is accessed, the binary log on the specified database instance is pulled to be stored locally in real time. Starting a daemon process on the log server, and simulating an IO thread of a slave library to be connected with the master library. And after receiving the request, the master library creates a dump thread and pushes the binary logs of the master library to other slave library nodes and log servers in the cluster respectively. And directly restoring the latest data from the binary log of the slave library by using the SQL thread, and storing the binary log to the log server for incremental data restoration in the database switching process.

Step 3, establishing a third-party management node to monitor the cluster state, and deploying a database switching script;

and performing heartbeat detection on the cluster by using a third-party management node, and regularly monitoring the states of all database cluster nodes, including whether each server node can be communicated, whether each database instance can be connected, whether the master-slave replication state is normal and the like. The management process is independent outside the cluster nodes, so that on one hand, the split problem caused by network partition faults is avoided, and on the other hand, one set of management nodes can simultaneously manage a plurality of sets of database replication clusters, so that the whole management platform architecture is clearer and more stable.

The method comprises the following steps that a script for switching database failures is also deployed on a management node, when a main library fails, the script is firstly connected to a main library server to store a binary log to the management node, and if the main library fails to be connected, the binary log server obtains the log of incremental data; then judging whether the semi-synchronous replication of the cluster nodes is normal or not, directly recovering the local log in a semi-synchronous mode, and recovering the binary log stored in the management node in an asynchronous mode; and finally, the slave library recovered to the latest data is promoted to be the master library, and the database switching process is completed.

In a specific embodiment, as shown in fig. 1-2, a database switching method for reducing data loss according to the present invention includes:

a MySQL cluster framework of a master-slave mode and a slave-slave mode with 3 nodes is built.

In this example, the MySQL database is targeted, with a version number of 5.7.28 community versions. The overall architecture of each node is shown in fig. 1. 3 database servers are prepared, and database software of MySQL 5.7 is installed respectively. A different server _ id is set for each instance and a copy mode for the GTID is turned on. An account number of a replay slave authority is created in a master library, then a change master of a slave library is configured to be connected to the master library, and the master _ auto _ position is designated as on. After configuration is complete, the copy thread is started with a start slave.

Configuring a master-slave structure as semi-synchronous copy and adding a synchronous timeout mechanism.

Firstly, semi-synchronously copied plug-in plugins are loaded on a master node and a slave node respectively, and then relevant parameters are set at a global level. The method comprises the steps of opening a semi-synchronous copy state, opening a Loss-Less semi-synchronous copy mode, setting the timeout time of waiting for 1 slave acknowledgement ack only and waiting for ack. The partial parameter settings involved are listed below and the settings are written into the parameter file.

INSTALL PLUGIN rpl_semi_sync_master SONAME'semisync_master.so'；

INSTALL PLUGIN rpl_semi_sync_slave SONAME'semisync_slave.so'；

SET GLOBAL rpl_semi_sync_master_enabled＝1；

SET GLOBAL rpl_semi_sync_slave_enabled＝1；

SET GLOBAL rpl_semi_sync_master_timeout＝5000；

SET GLOBAL rpl_semi_sync_master_wait_point＝AFTER_SYNC；

SET GLOBAL rpl_semi_sync_master_wait_for_slave_count＝1；

A binlog server is used to build an independent binary log server.

Then 1 independent server is prepared for storing binary log binlog server. Using the previously created replicated account number, a mysql brinlog is used on the binlog server to remotely obtain the incremental binary log of the master library. It is necessary to use the raw mode to ensure that the copied binlog is stored in binary format, and to use the stop-new parameter to continuously pull the binlog from the master node, to continuously backup to the current last one, and to continue.

And building a third-party management node by using the MHA, and deploying a MySQL switching script.

And installing the MHA packet on a third-party management node to be used as a management tool. Tools such as a master _ check _ repl tool, a master _ check _ status tool and the like are respectively used for detecting whether the master-slave data replication state and the MHA service running state are normal or not at the management node. The location of the binlog server, including the IP address and the specific binary log deposit path, needs to be specified in the configuration file binlog1 module. And finally, rewriting logic of a master _ ip _ failover _ script switching script in the MHA to satisfy the database switching logic mentioned in the technical scheme, wherein the main switching logic is a data recovery flow chart shown in fig. 2.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims. It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition. In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims

1. A database switching method for reducing data loss is characterized by comprising the following steps:

2. The database switching method for reducing data loss according to claim 1, wherein in step 1, a database cluster having at least three nodes is created and configured in a semi-synchronous replication mode, a certain timeout time is set for synchronization, and the timeout is stepped back to the asynchronous replication mode.

3. The database switching method for reducing data loss as claimed in claim 1, wherein in step 1, if the transaction of the client at the master library is always in a waiting state due to using the semi-synchronous replication, a synchronous timeout mechanism needs to be added, if the waiting time exceeds the configured timeout period during the waiting process, no slave node notifies the current transaction, the master library is switched to the asynchronous replication, and when at least one semi-synchronous slave node catches up, the master library is switched to the semi-synchronous replication.

4. The database switching method for reducing data loss according to claim 1, wherein in the step 2, the binary log of the database records all statements of data modification operation, and records the statements in the form of events.

5. The database switching method for reducing data loss according to claim 1, wherein in step 2, a binary log server is accessed, the binary log on the specified database instance is pulled in real time to be stored locally, a daemon process is started on the log server, and an IO thread simulated as a slave library is connected with the master library; after receiving the request, the master library creates a dump thread and pushes the binary logs of the master library to other slave library nodes and a log server in the cluster respectively; and directly restoring the latest data from the binary log of the slave library by using the SQL thread, and storing the binary log to the log server for incremental data restoration in the database switching process.

6. The database switching method for reducing data loss according to claim 1, wherein in step 3, a third-party management node is used to perform heartbeat detection on the cluster, and periodically monitor the states of all database cluster nodes, including whether each server node can be connected, whether each database instance can be connected, and whether the master-slave replication state is normal.

7. The database switching method for reducing data loss according to claim 1, wherein in step 3, the script for database failover is deployed on the management node, when the main library fails, the script is first connected to the main library server to store the binary log to the management node, and if the main library fails to be connected, the script is sent to the binary log server to obtain the log of the incremental data; then judging whether the semi-synchronous replication of the cluster nodes is normal or not, directly recovering the local log in a semi-synchronous mode, and recovering the binary log stored in the management node in an asynchronous mode; and finally, the slave library recovered to the latest data is promoted to be the master library, and the database switching process is completed.