CN111163120A

CN111163120A - Data storage and transmission method and device of distributed database and storage medium

Info

Publication number: CN111163120A
Application number: CN201811326286.7A
Authority: CN
Inventors: 尹亮; 赵钊
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2020-05-15

Abstract

The application provides a data storage transmission method and device of a distributed database and a non-transitory computer readable storage medium. The data storage and transmission method of the distributed database comprises the following steps: receiving a data writing request, and determining a first data node where a primary copy of data to be written is located and a first network mode of the first data node according to the data writing request; sending the data to be written to the first data node, and writing the data to be written in the first data node in the first network mode to serve as the primary copy; determining a second data node where the secondary copy of the data to be written is located and a second network mode of the second data node according to the data writing request, wherein the second data node and the first data node are in different regions, and the second network mode is different from the first network mode; and instructing the first data node to send the data to be written to the second data node in the second network mode, and writing the data to be written in the second data node in the second network mode to serve as the secondary copy.

Description

Data storage and transmission method and device of distributed database and storage medium

Technical Field

The present application relates to the field of database storage technologies, and in particular, to a method and an apparatus for data storage and transmission of a distributed database, and a non-transitory computer-readable storage medium.

Background

Distributed database systems have gained increasing popularity since their advent. At present, more and more enterprises adopt a distributed database system to establish a database of the enterprises or a system for management or service in the enterprises.

Distributed database systems typically employ multiple computers as nodes therein, each of which may be physically located separately, i.e., different nodes may be located in different locations or regions. Each node in a distributed database system may have a full copy, or partial copy, of the database management system with its own local database.

Disclosure of Invention

The application provides a data storage transmission method and device of a distributed database and a non-transitory computer readable storage medium.

According to a first aspect of the present application, a data storage and transmission method for a distributed database is provided, which includes:

receiving a data writing request, and determining a first data node where a primary copy of data to be written is located and a first network mode of the first data node according to the data writing request;

sending the data to be written to the first data node, and writing the data to be written in the first data node in the first network mode to serve as the primary copy;

determining a second data node where the secondary copy of the data to be written is located and a second network mode of the second data node according to the data writing request, wherein the second data node and the first data node are in different regions, and the second network mode is different from the first network mode;

and instructing the first data node to send the data to be written to the second data node in the second network mode, and writing the data to be written in the second data node in the second network mode to serve as the secondary copy.

According to a second aspect of the present application, there is provided an apparatus comprising:

a processor;

a memory for storing one or more programs;

the one or more programs, when executed by the processor, cause the processor to implement any of the methods described above.

According to a third aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to implement any one of the methods as described above.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 illustrates a flow chart of a method for data storage transmission of a distributed database according to an embodiment of the present application;

FIG. 2 illustrates a flow diagram of determining a first data node where a primary copy of data to be written is located and a first network mode of the first data node according to a data write request according to an embodiment of the present application;

FIG. 3 illustrates a flow chart of determining a second data node where a secondary copy of data to be written is located and a second network mode of the second data node according to a data write request according to an embodiment of the present application;

FIG. 4 illustrates a flow diagram for writing data to be written at a first data node in a first network mode according to one embodiment of the present application;

FIG. 5 shows a flow diagram for instructing a first data node to send data to be written to a second data node in a second network mode according to an embodiment of the present application;

FIG. 6 is a flow chart illustrating a method for data storage transmission of a distributed database according to another embodiment of the present application;

FIG. 7 shows a system architecture diagram of a distributed database according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below with reference to the accompanying drawings. It should be noted that the following description is merely exemplary in nature and is not intended to limit the present application. Further, in the following description, the same reference numbers will be used to refer to the same or like parts in different drawings. The different features in the different embodiments described below can be combined with each other to form further embodiments within the scope of the application.

Fig. 1 shows a flow chart of a data storage transmission method of a distributed database according to an embodiment of the present application. As shown in fig. 1, the method 100 may include steps S110 to S140. In step S110, a data write request is received, and a first data node where a primary copy of data to be written is located and a first network mode of the first data node are determined according to the data write request. When a user needs to write data into the distributed database, a data write request may be issued (for example, a certain physical unit of the data file may be saved for the first time, or a modification may be performed on a certain physical unit of the data file). Based on the data write request, the database management system may determine a first data node at which the primary copy of the data to be written is located. When the data is written for the first time, a first data node for storage can be allocated for the primary copy of the data; when the data is a modified content for a physical unit that already exists within the distributed database system, then the storage location of the primary copy of the original physical unit, i.e., at which data node, is determined. Furthermore, a network mode of the first data node needs to be determined. For the primary copy of the data to be written, the data can be written in a first network mode supported by the first data node. According to one embodiment, the first network mode supported by the first data node may be a low latency network mode, e.g. an RDMA (remote direct Memory Access) network mode. The RDMA network mode is designed to reduce delay of data processing in network transmission, requires dedicated hardware, and is efficient in a high-quality network environment, but is inefficient in data transmission in a low-quality network such as a cross-region network.

Subsequently, in step S120, the data to be written is sent to the first data node, and the data to be written is written in the first network mode at the first data node as a primary copy. That is to say, under the condition that the first data node supports the low-latency network mode, the data transmission and writing efficiency of the primary copy of the data to be written can be greatly improved.

In step S130, a second data node where the secondary copy of the data to be written is located and a second network mode of the second data node are determined according to the data write request. In this embodiment, the second data node is in a different region from the first data node, and the second network mode is different from the first network mode. After the primary copy is written, to ensure the reliability of the data, a secondary copy may be saved in the distributed database system for the data to be written. Based on the data write request, the database management system may determine a second data node at which the secondary copy of the data to be written is located. When the data is written for the first time, a second data node for storage can be allocated for the secondary copy; when the data is a modified version of a physical unit that already exists within the distributed database system, then the storage location of the secondary copy of the original physical unit, i.e., at which data node, is determined. Furthermore, a network mode of the second data node needs to be determined. And for the secondary copy of the data to be written, writing the data by adopting a second network mode supported by the second data node. In addition, the primary copy and the secondary copy are stored in data nodes of different regions, that is, the data replication of the primary copy and the secondary copy is cross-regional. According to an embodiment, the second network mode supported by the second data node may be a high latency network mode, for example, a TCP/IP (Transmission control Protocol/Internet Protocol) network mode. The TCP/IP network mode is the most basic protocol for data transmission, has strong versatility, but is not designed for a scenario of high-performance network communication, and has higher data transmission efficiency than the RDMA network mode in a low-quality network such as a cross-region network.

Subsequently, in step S140, the first data node is instructed to send the data to be written to the second data node in the second network mode, and the data to be written is written to the second data node in the second network mode as the secondary copy. That is to say, under the condition that the second data node supports the high-latency network mode, the cross-region data transmission and writing efficiency between the primary copy and the secondary copy of the data to be written can also be improved.

Therefore, for one data in the distributed database system, the primary copy and the secondary copy are respectively stored on different data nodes, so that the whole distributed database system can have complete data redundancy in different regions, the reliability of the data is improved, and remote disaster recovery is realized. In order to improve the reliability of the database and avoid user data loss due to data center level disasters, a plurality of copies exist for each physical fragment of data (each physical unit of a data file), and the management of data copies can be at least located in two data centers, wherein a primary copy is used for undertaking data writing, and a secondary copy and the primary copy form a copy relationship.

In addition, compared with a distributed database system only adopting a single network mode, different network modes can be adopted for different data nodes to carry out data transmission and writing, so that the network mode can be selected in a self-adaptive manner according to the physical position of a data unit, the data transmission and writing efficiency is improved, and the respective advantages of two transmission networks in a high-quality network and a low-quality network are fully exerted. For a distributed database system adopting a single network mode, if a single TCP/IP protocol is adopted as a data transmission layer, the reliability of data can be realized by remote copy, and the data consistency can be realized by synchronous copy. On the other hand, although the RDMA network mode is adopted between the computing and storage nodes, the delay is usually about 1/5 of the TCP/IP network mode, if only the RDMA protocol is adopted as the implementation of the data transmission layer, the data transmission is difficult to span across data centers (i.e. data nodes in different regions), and the data copy cannot be disaster-tolerant across regions, and cannot meet the requirement of the reliability of the business database. For an enterprise-level database, it is usually necessary to satisfy disaster recovery in different places, i.e. in different regions, a plurality of sets of the same database are constructed, and a service can be taken over instantly after a disaster occurs. Therefore, under a computing and storing separation architecture, storage needs to have complete redundancy in different regions, and real-time performance needs to be guaranteed, that is, data of different regions need to be completely consistent at any time. The limitation of RDMA networks is that the requirement for network links is high, on one hand, dedicated hardware is required, and on the other hand, if data transmission crosses data centers (i.e., data nodes in different regions), there is a high probability that the throughput is greatly reduced due to network congestion and network card data packet retransmission. Therefore, when the RDMA network is adopted, the multi-copy copying of the data is difficult to cross the data center, the requirement of realizing the remote real-time data copying is difficult, and the data security of the database cannot be guaranteed.

Fig. 2 shows a flowchart of determining a first data node where a primary copy of data to be written is located and a first network mode of the first data node according to a data write request according to an embodiment of the present application. As shown in fig. 2, the above step S110 may include sub-steps S111 and S112.

In sub-step S111, a data routing table is created or queried according to the data write request. The data routing table is a table for recording data nodes where the primary copy and the secondary copy of the data to be written are located and network modes of the data nodes. The database management system can look up the storage positions of the primary copy and the secondary copy of each physical unit of the data file and the network mode of the data nodes through the data routing table. When the data to be written is written for the first time, a data routing table can be created for the data to be written, or an entry is added to the existing data routing table to record the storage positions of the primary copy and the secondary copy of the data and the network mode of the data node. When the data to be written is the modification content of a physical unit existing in the distributed database system, the storage positions of the primary copy and the secondary copy of the original physical unit and the network mode of the data node can be determined by inquiring an existing data routing table.

Subsequently, in sub-step S112, the first data node where the primary copy is located and the first network mode of the first data node are determined according to the data routing table. Therefore, the storage positions of the primary copy and the secondary copy of the physical unit of the data file and the network mode of the data node can be recorded by creating the data routing table in the distributed database system, so that when a user modifies the physical unit, the storage position of the primary copy can be found, and the data transmission and storage are carried out by adopting the appropriate network mode.

Fig. 3 shows a flowchart of determining a second data node where a secondary copy of data to be written is located and a second network mode of the second data node according to a data write request according to an embodiment of the present application. As shown in fig. 3, the above step S130 may include sub-steps S131 and S132.

In sub-step S131, the data routing table is queried according to the data write request. As described above, the data routing table is a table that records the data nodes where the primary copy and the secondary copy of the data to be written are located and the network mode of the data nodes. The database management system can look up the storage positions of the primary copy and the secondary copy of each physical unit of the data file and the network mode of the data nodes through the data routing table. When the data to be written is written for the first time, a data routing table can be created for the data to be written, or an entry is added to the existing data routing table to record the storage positions of the primary copy and the secondary copy of the data and the network mode of the data node. When the data to be written is the modification content of a physical unit existing in the distributed database system, the storage positions of the primary copy and the secondary copy of the original physical unit and the network mode of the data node can be determined by inquiring an existing data routing table.

Subsequently, in sub-step S132, the second data node where the secondary copy is located and the second network mode of the second data node are determined according to the data routing table. Therefore, the storage position of the secondary copy of the physical unit of the data file and the network mode of the data node can be recorded through the data routing table in the distributed database system, so that when a user modifies the physical unit, the storage position of the secondary copy can be found, and data transmission and storage are carried out by adopting a proper network mode.

Fig. 4 shows a flow chart for writing data to be written in a first network mode at a first data node according to an embodiment of the present application. As shown in fig. 4, the step S120 may include sub-steps S121 to S123.

In substep S121, it is detected whether the first data node has a first transmission link in the first network mode for transmitting the data to be written. As described above, it is confirmed that the data transmission is performed in the first network mode in the first data node through step S110. In the first data node, the data to be written can then be transmitted therein only via the transmission link of the first network mode. Therefore, it is necessary to detect whether the first data node has the transmission link.

If the first data node has a first transmission link in the first network mode for transmitting the data to be written, then in sub-step S122, the first transmission link may be directly multiplexed to write the data to be written in the first network mode at the first data node.

If the first data node does not have a first transmission link in the first network mode for transmitting the data to be written, then in sub-step S123, a first transmission link in the first network mode needs to be created for transmitting the data to be written.

Fig. 5 shows a flow chart for instructing a first data node to send data to be written to a second data node in a second network mode according to an embodiment of the present application. As shown in fig. 5, the step S140 may include sub-steps S141 to S143.

In sub-step S141, it is detected whether there is a second transmission link in the second network mode between the first data node and the second data node for transmitting the data to be written. As described above, it is confirmed that the data transmission is performed in the second network mode in the second data node through step S130. In the second data node, the data to be written can then be transmitted therein only via the transmission link of the second network mode. Since the secondary replica in the second data node is replicated to the primary replica in the first data node. Therefore, it is necessary to detect whether there is such a transmission link between the first data node and the second data node.

If there is a second transmission link in the second network mode between the first data node and the second data node for transmitting the data to be written, then in sub-step S142, the second transmission link may be directly multiplexed to transmit the data to be written to the second data node in the second network mode by the first data node.

If there is no second transmission link in the second network mode between the first data node and the second data node for transmitting the data to be written, then in sub-step S143, a second transmission link in the second network mode needs to be created for transmitting the data to be written.

Fig. 6 shows a flowchart of a data storage transmission method of a distributed database according to another embodiment of the present application. As shown in fig. 6, the method 100 may further include a step S150 in addition to the steps S110 to S140. For the sake of brevity, only the differences of the embodiment shown in fig. 6 from fig. 1 will be described below, and detailed descriptions of the same parts will be omitted.

In step S150, the first data node is instructed to copy the data to be written in the first network mode and store the data in the first data node as an additional secondary copy. Therefore, an additional secondary copy can be arranged in the first data node where the primary copy is located, so that data redundancy is provided in the first data node, and low delay of local communication is facilitated while remote disaster tolerance is achieved.

FIG. 7 shows a system architecture diagram of a distributed database according to an embodiment of the present application. As shown in FIG. 7, the distributed database system may include IDCs located in City A (Internet Data Center: Internet Data Center) that support RDMA network mode and IDCs located in City B that do not support RDMA network mode. There is a SQL engine and multiple storage engines in the IDCs of both cities. In each storage engine, a plurality of data blocks are included for storing copies (including primary and secondary copies) of a physical unit of a data file.

For example, when a system receives a data write request from a user, the system first converts the data write request into a data block modification. The storage location of the primary copy of the data to be written (i.e., which IDC it is in) and the network mode are then looked up in the data routing table. Table 1 below is an example of a data routing table in which data files, data file fragments (physical units), all copies of a data file fragment (including primary, secondary, and additional secondary copies), a data center in which each copy is located, and network modes supported for accessing the data center are recorded.

Table 1: examples of data routing tables

From the results of the query of the data routing table, the primary copy of the data file fragment is located in IDC of City A, as shown by data block 701 in FIG. 7, which supports RDMA network mode. Subsequently, it is checked whether the network link specified by the primary replica exists. If the current data exists, the current data can be directly multiplexed; if not, a new RDMA network link needs to be created. In this way, the storage engine where the primary copy is located receives the data write on the network link, thereby completing the write of the primary copy.

The data then needs to be copied from the primary to the secondary. First, the storage location of the secondary copy of the data to be written (i.e., which IDC it is in) and the network mode are looked up in the data routing table. As shown in Table 1 above, the secondary copy of the data file fragment is located in IDC in City B, as shown by data block 702 in FIG. 7, which supports TCP/IP network mode. Subsequently, it is checked whether the network link specified by the secondary copy exists. If the current data exists, the current data can be directly multiplexed; if not, a new TCP/IP network link needs to be created. In this manner, the data will be copied from the primary replica to the secondary replica via the network link, thereby completing the copying of the data.

In addition, the data may also be copied from the primary copy to additional secondary copies, which may be located in the same data center as the primary copy, e.g., the additional secondary copy File-partitionN-replica2 recorded in table 1 above, which is also located in IDC in city a, as shown by data block 703 in fig. 7, which also supports RDMA network mode. Subsequently, it may be checked whether the network link specified by the additional secondary replica exists. If the current data exists, the current data can be directly multiplexed; if not, a new RDMA network link needs to be created. In this manner, the data will be replicated by the primary replica to additional secondary replicas through the network link, thereby completing the replication of the data, which is done within the same data center.

According to another aspect of the application, there is provided an apparatus comprising a processor and a memory, the memory storing one or more programs which, when executed by the processor, cause the processor to carry out the method as described above.

According to yet another aspect of the application, a non-transitory computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, may cause the processor to carry out the method as described above.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or computer program product. Accordingly, this application may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to as a "circuit," module "or" system. Furthermore, the present application may take the form of a computer program product embodied in any tangible expression medium having computer-usable program code embodied in the medium.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Although the above description includes many specific arrangements and parameters, it should be noted that these specific arrangements and parameters are merely illustrative of one embodiment of the present application. This should not be taken as limiting the scope of the application. Those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the application. Accordingly, the scope of the application should be construed based on the claims.

Claims

1. A data storage and transmission method of a distributed database comprises the following steps:

2. The method of claim 1, wherein receiving a data write request and determining a first data node at which a primary copy of data to be written is located and a first network mode of the first data node based on the data write request comprises:

creating or inquiring a data routing table according to the data writing request, wherein the data routing table records data nodes where the primary copy and the secondary copy of the data to be written are located and network modes of the data nodes;

and determining a first data node where the primary copy is located and a first network mode of the first data node according to the data routing table.

3. The method of claim 2, wherein determining, from the data write request, a second data node at which the secondary copy of the data to be written is located and a second network mode of the second data node comprises:

inquiring the data routing table according to the data writing request;

and determining a second data node where the secondary copy is located and a second network mode of the second data node according to the data routing table.

4. The method of claim 1, wherein sending the data to be written to the first data node and writing the data to be written in the first network mode at the first data node as the primary replica comprises:

detecting whether the first data node has a first transmission link in the first network mode for transmitting the data to be written;

if yes, writing the data to be written in the first network mode at the first data node through the first transmission link;

and if not, creating a first transmission link in the first network mode for transmitting the data to be written.

5. The method of claim 4, wherein instructing the first data node to send the data to be written to the second data node in the second network mode and to write the data to be written to the second data node in the second network mode as the secondary copy comprises:

detecting whether a second transmission link in the second network mode for transmitting the data to be written exists between the first data node and the second data node;

if yes, the first data node is instructed to send the data to be written to the second data node in the second network mode through the second transmission link;

and if not, creating a second transmission link in the second network mode for transmitting the data to be written.

6. The method of claim 1, further comprising:

and instructing the first data node to copy the data to be written in the first network mode and store the data in the first data node as an additional secondary copy.

7. The method of claim 1, wherein the first network mode is a low latency network mode and the second network mode is a high latency network mode.

8. The method of claim 1, wherein the first network mode is an RDMA network mode and the second network mode is a TCP/IP network mode.

9. An apparatus, comprising:

a processor;

a memory for storing one or more programs;

the one or more programs, when executed by the processor, cause the processor to implement the method of any of claims 1-8.

10. A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to implement the method of any one of claims 1-8.