[go: up one dir, main page]

CN114036129A - Database switching method for reducing data loss - Google Patents

Database switching method for reducing data loss Download PDF

Info

Publication number
CN114036129A
CN114036129A CN202111315451.0A CN202111315451A CN114036129A CN 114036129 A CN114036129 A CN 114036129A CN 202111315451 A CN202111315451 A CN 202111315451A CN 114036129 A CN114036129 A CN 114036129A
Authority
CN
China
Prior art keywords
database
library
semi
slave
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111315451.0A
Other languages
Chinese (zh)
Inventor
刘小龙
唐军
周文龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Cric Technology Co ltd
Original Assignee
Sichuan Cric Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Cric Technology Co ltd filed Critical Sichuan Cric Technology Co ltd
Priority to CN202111315451.0A priority Critical patent/CN114036129A/en
Publication of CN114036129A publication Critical patent/CN114036129A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a database switching method for reducing data loss, which is characterized in that by combining the characteristics of semi-synchronous replication and MHA cluster schemes, in a master-slave architecture, a replicated cluster is required to have at least three database servers, and if only one slave node receives the latest binary log, the latest binary log can be applied to all other slave servers. On the basis, a third-party log server is introduced, and when the main library transaction is submitted, the logs of the local and third-party log servers are written simultaneously. Even if the semi-synchronous component fails and the main database server cannot access the semi-synchronous component, the logs can still be obtained from the third-party log server, so that the risk of data loss caused by the missing of the binary logs in the whole database switching process is avoided.

Description

Database switching method for reducing data loss
Technical Field
The invention relates to the technical field of databases, in particular to a database switching method for reducing data loss.
Background
The available architecture is basically standard for internet services, and both application services and database services need to be highly available. For a system, many modules may be included, such as front-end applications, caches, databases, message queues, etc., and each module needs to be highly available to ensure high availability of the whole system. For database services, high availability may be more complicated, and services available to users, not only access, but also need to have correctness guarantee, so when discussing a high availability scheme for a database, data consistency problems in the scheme are generally considered at the same time.
In the current high-availability architecture of the MySQL database, semi-synchronous replication is often used to ensure the data consistency of database switching. In a master-slave architecture, when a master library commits a transaction, the commit needs to be completed after receiving feedback from the slave library written in the log. But in the data synchronization process, semi-synchronization may degrade to asynchronous replication due to network problems, large transaction problems, etc. In the asynchronous replication, the transaction submitted by the master library is only guaranteed to be written into the log of the master server, and the real-time switching between the master and the slave can cause that the slave library does not have the latest data with the master library. Data consistency cannot be guaranteed if it relies entirely on asynchronous replication or semi-synchronous replication components. In many MHA cluster solutions, trying to save binary logs from a down main server during automatic failover ensures that data is not lost to the greatest extent, but this is not always feasible. If the main server hardware fails or cannot be accessed through ssh, the MHA cannot save the binary log, and only failover is performed, and the latest data is lost.
Disclosure of Invention
The present invention provides a database switching method for reducing data loss to solve the above problems, and combines the features of semi-synchronous replication and MHA clustering schemes, and in a master-slave architecture, requires that there must be at least three database servers in a replication cluster, and if only one slave node has received the latest binary log, the latest binary log can be applied to all other slave servers. On the basis, a third-party log server is introduced, and when the main library transaction is submitted, the logs of the local and third-party log servers are written simultaneously. Even if the semi-synchronous component fails and the main database server cannot access the semi-synchronous component, the logs can still be obtained from the third-party log server, so that the risk of data loss caused by the missing of the binary logs in the whole database switching process is avoided.
The invention realizes the purpose through the following technical scheme:
a database switching method for reducing data loss comprises the following steps:
step 1, a multi-node database cluster based on semi-synchronous replication is established, and a synchronous timeout mechanism is added;
step 2, accessing a binary log server, and storing incremental data generated by a main database in real time;
and 3, creating a third-party management node to monitor the cluster state, and deploying a database switching script.
In the step 1, a database cluster with at least three nodes is created and configured as a semi-synchronous replication mode, and a certain timeout time is set for synchronization, i.e. the database cluster is stepped back to an asynchronous replication mode.
In the step 1, if the transaction of the client in the master library is always in a waiting state due to the semi-synchronous replication, a synchronous timeout mechanism needs to be added, if the waiting time exceeds the configured timeout time in the waiting process, and no slave node notifies the current transaction, the master library is converted into asynchronous replication, and when at least one semi-synchronous slave node catches up, the master library is converted into semi-synchronous replication.
Further, in the step 2, the binary log of the database records all statements of the data modification operation, and records the statements in the form of events.
A further scheme is that in the step 2, a binary log server is accessed, a binary log on an appointed database instance is pulled to be stored locally in real time, a daemon process is started on the log server, and an IO thread simulated as a slave library is connected with a master library; after receiving the request, the master library creates a dump thread and pushes the binary logs of the master library to other slave library nodes and a log server in the cluster respectively; and directly restoring the latest data from the binary log of the slave library by using the SQL thread, and storing the binary log to the log server for incremental data restoration in the database switching process.
In step 3, heartbeat detection is performed on the cluster by using a third-party management node, and the states of all database cluster nodes are regularly monitored, including whether each server node can be connected, whether each database instance can be connected, and whether the master-slave replication state is normal.
In the step 3, a script for database fault switching is deployed on the management node, when the main library fails, the script is firstly connected to the main library server to store the binary log to the management node, and if the main library fails to be connected, the script is used for obtaining the log of the incremental data from the binary log server; then judging whether the semi-synchronous replication of the cluster nodes is normal or not, directly recovering the local log in a semi-synchronous mode, and recovering the binary log stored in the management node in an asynchronous mode; and finally, the slave library recovered to the latest data is promoted to be the master library, and the database switching process is completed.
The invention has the beneficial effects that:
according to the database switching method for reducing data loss, the third-party management node monitors the semi-synchronous replication cluster of the database, and then the semi-synchronous replication cluster is accessed to the independent binary log server to store incremental data, so that the risk of data loss in the MySQL database fault switching process is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following briefly introduces the embodiments or the drawings needed to be practical in the prior art description, and obviously, the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is an architecture diagram of the nodes of a cluster.
FIG. 2 is a flow diagram of managing node data recovery.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.
In any embodiment, as shown in fig. 1-2, a database switching method for reducing data loss according to the present invention includes the following steps:
step 1, a multi-node database cluster based on semi-synchronous replication is established, and a synchronous timeout mechanism is added;
creating a database cluster with at least three nodes, configuring the database cluster into a semi-synchronous replication mode, and setting a certain timeout time for synchronization, wherein the timeout time is stepped back to the asynchronous replication mode. MySQL is asynchronous by default, and the master library returns the result to the client immediately after executing the transaction submitted by the client, and does not care whether the slave library has been received and processed. This has the problem that if the master is hung up, the transaction that has been committed on the master may not be passed to the slave, and at this point a database switch may result in incomplete data on the new master. And by using semi-synchronous replication, the master library is not returned to the client immediately after the transaction submitted by the client is executed, but is returned to the client after at least one slave library is received and written into a local log, so that the safety of data synchronization is improved.
But the use of semi-synchronous replication in some specific scenarios may result in the client's transactions at the home base being always waiting because of the need to add a synchronization timeout mechanism, possibly for network reasons or for the impact of large transactions. If the waiting time exceeds the configured timeout time in the waiting process, no slave node informs the current transaction, the master library is converted into asynchronous replication, and when at least one semi-synchronous slave node catches up, the master library is converted into semi-synchronous replication.
Step 2, accessing a binary log server, and storing incremental data generated by a main database in real time;
the binary log of the database records all statements of data modification operation, records the statements in the form of events, and can restore the database to any time point by the aid of the complete binary log. In the cluster node, data synchronization is also realized through synchronization of the binary logs. Although the semi-synchronous replication mode is adopted, the condition is degraded into asynchronous replication in the period of cluster operation. There is still a risk of data loss when a database failover occurs under asynchronous replication conditions.
And when the binary log server is accessed, the binary log on the specified database instance is pulled to be stored locally in real time. Starting a daemon process on the log server, and simulating an IO thread of a slave library to be connected with the master library. And after receiving the request, the master library creates a dump thread and pushes the binary logs of the master library to other slave library nodes and log servers in the cluster respectively. And directly restoring the latest data from the binary log of the slave library by using the SQL thread, and storing the binary log to the log server for incremental data restoration in the database switching process.
Step 3, establishing a third-party management node to monitor the cluster state, and deploying a database switching script;
and performing heartbeat detection on the cluster by using a third-party management node, and regularly monitoring the states of all database cluster nodes, including whether each server node can be communicated, whether each database instance can be connected, whether the master-slave replication state is normal and the like. The management process is independent outside the cluster nodes, so that on one hand, the split problem caused by network partition faults is avoided, and on the other hand, one set of management nodes can simultaneously manage a plurality of sets of database replication clusters, so that the whole management platform architecture is clearer and more stable.
The method comprises the following steps that a script for switching database failures is also deployed on a management node, when a main library fails, the script is firstly connected to a main library server to store a binary log to the management node, and if the main library fails to be connected, the binary log server obtains the log of incremental data; then judging whether the semi-synchronous replication of the cluster nodes is normal or not, directly recovering the local log in a semi-synchronous mode, and recovering the binary log stored in the management node in an asynchronous mode; and finally, the slave library recovered to the latest data is promoted to be the master library, and the database switching process is completed.
In a specific embodiment, as shown in fig. 1-2, a database switching method for reducing data loss according to the present invention includes:
a MySQL cluster framework of a master-slave mode and a slave-slave mode with 3 nodes is built.
In this example, the MySQL database is targeted, with a version number of 5.7.28 community versions. The overall architecture of each node is shown in fig. 1. 3 database servers are prepared, and database software of MySQL 5.7 is installed respectively. A different server _ id is set for each instance and a copy mode for the GTID is turned on. An account number of a replay slave authority is created in a master library, then a change master of a slave library is configured to be connected to the master library, and the master _ auto _ position is designated as on. After configuration is complete, the copy thread is started with a start slave.
Configuring a master-slave structure as semi-synchronous copy and adding a synchronous timeout mechanism.
Firstly, semi-synchronously copied plug-in plugins are loaded on a master node and a slave node respectively, and then relevant parameters are set at a global level. The method comprises the steps of opening a semi-synchronous copy state, opening a Loss-Less semi-synchronous copy mode, setting the timeout time of waiting for 1 slave acknowledgement ack only and waiting for ack. The partial parameter settings involved are listed below and the settings are written into the parameter file.
INSTALL PLUGIN rpl_semi_sync_master SONAME'semisync_master.so';
INSTALL PLUGIN rpl_semi_sync_slave SONAME'semisync_slave.so';
SET GLOBAL rpl_semi_sync_master_enabled=1;
SET GLOBAL rpl_semi_sync_slave_enabled=1;
SET GLOBAL rpl_semi_sync_master_timeout=5000;
SET GLOBAL rpl_semi_sync_master_wait_point=AFTER_SYNC;
SET GLOBAL rpl_semi_sync_master_wait_for_slave_count=1;
A binlog server is used to build an independent binary log server.
Then 1 independent server is prepared for storing binary log binlog server. Using the previously created replicated account number, a mysql brinlog is used on the binlog server to remotely obtain the incremental binary log of the master library. It is necessary to use the raw mode to ensure that the copied binlog is stored in binary format, and to use the stop-new parameter to continuously pull the binlog from the master node, to continuously backup to the current last one, and to continue.
And building a third-party management node by using the MHA, and deploying a MySQL switching script.
And installing the MHA packet on a third-party management node to be used as a management tool. Tools such as a master _ check _ repl tool, a master _ check _ status tool and the like are respectively used for detecting whether the master-slave data replication state and the MHA service running state are normal or not at the management node. The location of the binlog server, including the IP address and the specific binary log deposit path, needs to be specified in the configuration file binlog1 module. And finally, rewriting logic of a master _ ip _ failover _ script switching script in the MHA to satisfy the database switching logic mentioned in the technical scheme, wherein the main switching logic is a data recovery flow chart shown in fig. 2.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims. It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition. In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims (7)

1. A database switching method for reducing data loss is characterized by comprising the following steps:
step 1, a multi-node database cluster based on semi-synchronous replication is established, and a synchronous timeout mechanism is added;
step 2, accessing a binary log server, and storing incremental data generated by a main database in real time;
and 3, creating a third-party management node to monitor the cluster state, and deploying a database switching script.
2. The database switching method for reducing data loss according to claim 1, wherein in step 1, a database cluster having at least three nodes is created and configured in a semi-synchronous replication mode, a certain timeout time is set for synchronization, and the timeout is stepped back to the asynchronous replication mode.
3. The database switching method for reducing data loss as claimed in claim 1, wherein in step 1, if the transaction of the client at the master library is always in a waiting state due to using the semi-synchronous replication, a synchronous timeout mechanism needs to be added, if the waiting time exceeds the configured timeout period during the waiting process, no slave node notifies the current transaction, the master library is switched to the asynchronous replication, and when at least one semi-synchronous slave node catches up, the master library is switched to the semi-synchronous replication.
4. The database switching method for reducing data loss according to claim 1, wherein in the step 2, the binary log of the database records all statements of data modification operation, and records the statements in the form of events.
5. The database switching method for reducing data loss according to claim 1, wherein in step 2, a binary log server is accessed, the binary log on the specified database instance is pulled in real time to be stored locally, a daemon process is started on the log server, and an IO thread simulated as a slave library is connected with the master library; after receiving the request, the master library creates a dump thread and pushes the binary logs of the master library to other slave library nodes and a log server in the cluster respectively; and directly restoring the latest data from the binary log of the slave library by using the SQL thread, and storing the binary log to the log server for incremental data restoration in the database switching process.
6. The database switching method for reducing data loss according to claim 1, wherein in step 3, a third-party management node is used to perform heartbeat detection on the cluster, and periodically monitor the states of all database cluster nodes, including whether each server node can be connected, whether each database instance can be connected, and whether the master-slave replication state is normal.
7. The database switching method for reducing data loss according to claim 1, wherein in step 3, the script for database failover is deployed on the management node, when the main library fails, the script is first connected to the main library server to store the binary log to the management node, and if the main library fails to be connected, the script is sent to the binary log server to obtain the log of the incremental data; then judging whether the semi-synchronous replication of the cluster nodes is normal or not, directly recovering the local log in a semi-synchronous mode, and recovering the binary log stored in the management node in an asynchronous mode; and finally, the slave library recovered to the latest data is promoted to be the master library, and the database switching process is completed.
CN202111315451.0A 2021-11-08 2021-11-08 Database switching method for reducing data loss Pending CN114036129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111315451.0A CN114036129A (en) 2021-11-08 2021-11-08 Database switching method for reducing data loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111315451.0A CN114036129A (en) 2021-11-08 2021-11-08 Database switching method for reducing data loss

Publications (1)

Publication Number Publication Date
CN114036129A true CN114036129A (en) 2022-02-11

Family

ID=80136690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111315451.0A Pending CN114036129A (en) 2021-11-08 2021-11-08 Database switching method for reducing data loss

Country Status (1)

Country Link
CN (1) CN114036129A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115617908A (en) * 2022-10-12 2023-01-17 河南星环众志信息科技有限公司 A MySQL data synchronization method, device, database terminal, medium and system
CN116340425A (en) * 2023-03-06 2023-06-27 平安付科技服务有限公司 Data management method, device, medium and computing equipment of MHA high-availability architecture
CN117931531A (en) * 2024-03-22 2024-04-26 腾讯科技(深圳)有限公司 Data backup system, method, device, equipment, storage medium and program product
CN119003539A (en) * 2024-06-18 2024-11-22 广州鲸旗网络科技有限公司 Method and system for hot-cutting MySql database under production environment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330035A (en) * 2017-06-26 2017-11-07 努比亚技术有限公司 Operation Log synchronous method, mobile terminal and computer-readable recording medium in a kind of database
CN108415951A (en) * 2018-02-02 2018-08-17 广东睿江云计算股份有限公司 A kind of database control method and system
CN110825763A (en) * 2020-01-08 2020-02-21 上海爱可生信息技术股份有限公司 MySQL database high-availability system based on shared storage and high-availability method thereof
CN111061594A (en) * 2019-12-02 2020-04-24 上海浪擎信息科技有限公司 Log logic analysis-based relational database data replication method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330035A (en) * 2017-06-26 2017-11-07 努比亚技术有限公司 Operation Log synchronous method, mobile terminal and computer-readable recording medium in a kind of database
CN108415951A (en) * 2018-02-02 2018-08-17 广东睿江云计算股份有限公司 A kind of database control method and system
CN111061594A (en) * 2019-12-02 2020-04-24 上海浪擎信息科技有限公司 Log logic analysis-based relational database data replication method
CN110825763A (en) * 2020-01-08 2020-02-21 上海爱可生信息技术股份有限公司 MySQL database high-availability system based on shared storage and high-availability method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KELE_BABA: "数据库之MHA高可用集群部署及故障切换", pages 1 - 11, Retrieved from the Internet <URL:https://blog.csdn.net/kele_baba/article/details/118404616> *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115617908A (en) * 2022-10-12 2023-01-17 河南星环众志信息科技有限公司 A MySQL data synchronization method, device, database terminal, medium and system
CN116340425A (en) * 2023-03-06 2023-06-27 平安付科技服务有限公司 Data management method, device, medium and computing equipment of MHA high-availability architecture
CN117931531A (en) * 2024-03-22 2024-04-26 腾讯科技(深圳)有限公司 Data backup system, method, device, equipment, storage medium and program product
WO2025195152A1 (en) * 2024-03-22 2025-09-25 腾讯科技(深圳)有限公司 Data backup system, method and apparatus, and device, storage medium and program product
CN119003539A (en) * 2024-06-18 2024-11-22 广州鲸旗网络科技有限公司 Method and system for hot-cutting MySql database under production environment

Similar Documents

Publication Publication Date Title
CN114036129A (en) Database switching method for reducing data loss
CN107291787B (en) Active-standby database switching method and device
US7392421B1 (en) Framework for managing clustering and replication
CN112506702B (en) Disaster recovery method, device, equipment and storage medium for data center
CN110535680B (en) A Byzantine Fault Tolerance Method
US20120151272A1 (en) Adding scalability and fault tolerance to generic finite state machine frameworks for use in automated incident management of cloud computing infrastructures
CN111966467B (en) Method and device for disaster recovery based on kubernetes container platform
WO2022036901A1 (en) Implementation method and apparatus for redis replica set
CN102360324A (en) Failure recovery method and equipment for failure recovery
CN111371599A (en) Cluster disaster recovery management system based on ETCD
CN114500289B (en) Control plane recovery method, device, control node and storage medium
CN114764380A (en) Distributed cluster control method and device based on ETCD
CN116166470B (en) Redis cluster cloning and replication method, device, medium, and equipment
CN118018463A (en) Fault handling method, device, equipment and readable storage medium
CN115314361B (en) A server cluster management method and related components
CN1275476C (en) Clustering system for utilizing sharing internal memory in mobile communiation system and realizing method thereof
WO2002001347A2 (en) Method and system for automatic re-assignment of software components of a failed host
CN110554933A (en) Cloud management platform, and cross-cloud high-availability method and system for cloud platform service
CN114356711A (en) Database fault self-healing method, system and related device
CN114422335A (en) Communication method, communication device, server and storage medium
CN116185697B (en) Container cluster management method, device and system, electronic equipment and storage medium
JP2012014673A (en) Cluster system restoration method, server and program
CN111581013A (en) System information backup and reconstruction method based on metadata and shadow files
Corsava et al. Intelligent architecture for automatic resource allocation in computer clusters
CN117667523A (en) Database cluster maintenance method and system for improving high availability of Oracle DG

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination