[go: up one dir, main page]

HK1182804B - Distributed data mirroring method and data store nodes - Google Patents

Distributed data mirroring method and data store nodes Download PDF

Info

Publication number
HK1182804B
HK1182804B HK13110052.9A HK13110052A HK1182804B HK 1182804 B HK1182804 B HK 1182804B HK 13110052 A HK13110052 A HK 13110052A HK 1182804 B HK1182804 B HK 1182804B
Authority
HK
Hong Kong
Prior art keywords
data
data storage
mirroring
destination
storage node
Prior art date
Application number
HK13110052.9A
Other languages
Chinese (zh)
Other versions
HK1182804A (en
Inventor
段兵
朱国云
Original Assignee
阿里巴巴集团控股有限公司
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of HK1182804A publication Critical patent/HK1182804A/en
Publication of HK1182804B publication Critical patent/HK1182804B/en

Links

Description

Distributed data mirroring method and data storage node
Technical Field
The present application relates to the field of network storage, and in particular, to a distributed data mirroring method and a storage data node.
Background
With the continuous development of the internet, data on the internet is explosively increased, and the demand for data access capability is also continuously increased. These large amounts of data are prone to data loss due to system failure and disk corruption. Therefore, how to guarantee the safety of mass data becomes the focus of attention.
Most of the current remote mirroring systems adopt a centralized mode. For example, mirroring between databases, the solution of the typical centralized mirroring system adopts a primary and secondary structure, that is, one database corresponds to the other database, and one of the two databases is a primary database and is mainly responsible for synchronizing data to the other database; the other is an auxiliary database which mainly receives data mirrored from the main database.
In the scheme, no matter the primary database or the secondary database is adopted, if one database fails, the whole mirroring system cannot operate, and the mirroring operation can be performed only after the database is recovered. Meanwhile, if massive data access is dealt with, a performance bottleneck is easily caused by the centralized system due to inconvenient capacity expansion, and at the moment, additional mirroring operation is required to be added, so that greater pressure is brought to the system.
Disclosure of Invention
The present application aims to provide a Distributed data mirroring method and a storage data node, where a Distributed File System (DFS) is used as a data center, and any storage data node in the data center can be used as a data master server in a mirroring System, and mirroring is performed on any storage data node in a DFS System of a backup data center.
In order to achieve the above object, the present application provides a distributed data mirroring method, including:
receiving a data operation request, performing an operation corresponding to the data operation request on data, performing a backup operation corresponding to the data operation request on the data, and generating a logical file name of the data, wherein the logical file name includes file region information;
generating a mirror image record comprising a logical file name, a mirror image strategy and an operation type of the data;
acquiring a list of destination data storage node addresses to be mirrored from a master control node of a mirror destination distributed file system determined by the file region information;
and according to the mirror image strategy and the operation type in the mirror image record, performing corresponding operation of the data operation request on the data on a target data storage node corresponding to the target data storage node address in the list.
The present application also provides a storage data node, including:
the request processing unit is used for receiving a data operation request, performing operation corresponding to the data operation request on data and generating a logic file name of the data;
the generating unit is used for generating a mirror image record comprising a logical file name, a mirror image strategy and an operation type of the data, wherein the logical file name comprises file area information;
the mirror image address acquisition unit is used for acquiring a list of destination data storage node addresses to be mirrored from a master control node of a mirror image destination distributed file system determined by the file region information;
and the data mirroring unit is used for performing corresponding operation of the data operation request on the destination data storage node corresponding to the destination data storage node address of the data in the list according to the mirroring strategy and the operation type in the mirroring record.
Therefore, the distributed data mirroring method and the storage data nodes utilize the distributed file system DFS as a data center to conduct data mirroring, and select the available storage data nodes to conduct data mirroring, so that high reliability of a mirroring system is achieved.
Drawings
Fig. 1 is an architecture diagram of a distributed mirroring system to which the distributed data mirroring method of the present application is applied.
Fig. 2 is a flowchart of an embodiment of a distributed data mirroring method according to the present application.
FIG. 3 is a flowchart of another embodiment of a distributed data mirroring method according to the present application.
FIG. 4 is a flowchart illustrating a distributed data mirroring method according to another embodiment of the present application.
FIG. 5 is a flowchart illustrating a distributed data mirroring method according to another embodiment of the present application.
Detailed Description
The technical solution of the present application is further described in detail by the accompanying drawings and examples.
The Distributed File System (DFS) is used as a data center, the distributed file system is composed of a plurality of data storage nodes (DataServer) and a main control node (NameServer), and the distributed mirror image system is composed of at least two distributed file systems, so that any data storage node in one data center DFS can be used as a data main server in the mirror image system to mirror any data storage node of a plurality of data center DFS as backups.
Fig. 1 is a diagram illustrating an architecture of a distributed mirroring system to which the distributed data mirroring method of the present application is applied. The distributed mirroring system is shown to be composed of two data center DFS systems, and data mirroring for client user request operations is performed between a data center a cluster and another data center B cluster. Each DFS system is composed of a main control node and N data storage nodes, wherein the main control node is responsible for managing the data storage nodes, data distribution, data positioning and the like in the cluster. And the data storage node is responsible for data file management and data mirroring. The storage data node constantly monitors the data change condition of the storage data node where the storage data node is located, and corresponding mirroring operation is initiated once the data change condition is changed.
The storage data node can receive the mirror data from the different DFS and can mirror the data to the different DFS. The storage data nodes in the DFS system can be added and withdrawn at any time, and the addition and withdrawal of the storage data nodes cannot influence the data availability, because when a certain data file in a single DFS system is stored in one storage data node, a plurality of backups can be carried out on other storage data nodes in the DFS system, the data file is stored in different data storage nodes and is arranged on different racks, and therefore the mirror image system cannot be influenced when the single storage data node fails. When the data storage data node is added and withdrawn, the main control node can transfer the data on the data storage data node, and the main control node r moves the data to the storage data node with lower load according to the capacity and the load of the data storage node. This may be done by replicating multiple copies of data or a single copy of data in the same DFS cluster, where the copies of the data are distributed across different racks. The storage and backup of data described above in a single DFS system is prior art and therefore not described in detail.
The distributed data mirroring system requires at least two data centers to implement, such as 2 data centers shown in fig. 1. Each data center has a set of distributed file system clusters (a cluster and B cluster). The DFS system A cluster and the DFS system B cluster are peer-to-peer and have no primary-secondary relationship. The storage data nodes, data distribution and data of a single DFS system are managed by master nodes within the current cluster system. The main control nodes between the DFS systems do not have any information interaction, and any data storage nodes in each DFS system can perform interactive communication and data transmission mutually.
Fig. 2 is a flowchart of a distributed data mirroring method according to the present application. In this embodiment, a main body of a storage data node in one data center DFS system (cluster a) is described, how the storage data node is used as a data master server to perform data mirroring on storage data nodes in another data center DFS system cluster B after receiving an operation request of data. In this example, the data operation request is a write of certain data, and the same applies to the update data request.
With reference to fig. 1 and fig. 2, the present application is applied to a distributed data mirroring system formed by at least two distributed file systems including a plurality of storage data nodes and a master node, where a cluster a executes the following steps on the storage data nodes, including:
step 11, receiving a data operation request, performing an operation corresponding to the data operation request on data, performing a backup operation corresponding to the data operation request on the data, and generating a logical file name for the data;
for example, the data operation request is a write data operation request, the storage data node performs storage operation on the data after receiving the request, and forwards the data to other storage data nodes in the cluster a for storage backup, and after the storage backup is successful, a logical file name is generated.
Here, it should be understood that the data operation request received by the storage data node is: a user initiates an operation request for writing data to a main control node in a DFS system through a client, and the main control node distributes a writable list of data storage nodes to the client; a user selects one storage data node in the list to request to write data and initiates the data operation request; it should be noted that the data write request and the data update request are two substantially identical examples in the present application, and therefore, in describing the present example, the data update request is simply explained.
Namely, the following steps (not shown in the figure) are also included before step 11:
step 10, a main control node in the distributed file system receives information which is input by a user through a client and requires to write data;
or request information of the update data can be input, and the request information of the update data carries a logical file name generated when the data is written;
step 20, the master control node returns an address list of the data storage node to the client according to the information of the data required to be written;
or, the logical file name may be returned to the storage data node list of the data block number for writing data assigned during the write operation;
step 30, the user selects a data storage node from the address list of the data storage node through the client, so as to initiate a data operation request to the data storage node.
Because data storage in a single DFS system can be backed up at the same time, namely the data storage node forwards the data to other data storage nodes in the list for storage, so that a plurality of backups of the data are made on the DFS system; therefore, by utilizing the characteristic of the DFS system, the data security can not be influenced even if one storage data node is down in the storage data nodes for storing the data.
It should be understood that the data is data that the user has already made a split at the client, for example, the size of data to be stored by the user is 10M, but the system is configured with 2M data suitable for storage, so the user splits the data at the client, and of course, the data may also be data suitable for storage by itself, for example, data smaller than 2M; when the master control node allocates the list of the data storage nodes to the client, which data block of the data storage nodes (the data storage nodes are divided into a plurality of data blocks for storing data) the data is written into is already allocated, that is, the block number for storing the data block is already allocated. Here, the data operation request initiated by the user may be an update data operation, and the update data operation and the newly added write data are substantially the same and are both write data. The data block number is originally allocated in writing, so that the main control node allocates a list of the data storage nodes to the client, and the data storage nodes of the originally allocated data block number are determined according to the logical file name carried in the data updating request initiated by the user.
When the storage data node receives the data storage request and writes data, the storage data node assigns a logical file name (LogicName) inside the system to each data. And after the data is successfully written and backed up, the storage data node returns the logical file name to the client, and the client records the corresponding relation between the main control node of the DFS system and the logical file name of the data storage correspondingly. Therefore, when the client accesses the data file, the main control node locates which data storage node the data file is stored on according to the logical file name. The logical file name is a string of character strings generated according to the data block number, the file ID and other information, so that information indicating the area where the data is located, called file area information, exists in the logical file name, and the data block number on the storage data node where the data is stored can be analyzed according to the file area information.
In addition, the data write request sent by the client implies what type of mirroring operation is performed on the data, for example, new write in this document, in addition to the information requesting to write the data.
Step 12, storing a data node to generate a mirror image record, wherein the mirror image record comprises a logical file name, a mirror image strategy and an operation type of the data; wherein the logical file name includes file region information;
when the data storage node monitors that data change exists, namely new data are written and data backup is completed, a mirror image record is immediately generated. I.e. the present application also includes after step 11; before step 12, further comprising: the storage data node monitors whether there is data written.
The operation type (OperType) for the data in the mirror record is determined according to the type of the data operation request initiated by the user. There are three types of operations on a data file: newly adding writing, updating and deleting. In this example, the operation type is a type of performing a new write operation in response to a data storage request initiated by a user.
The DFS system pre-defines a mirroring policy (MirrorStrategy) for data, and can be divided into a synchronous mirroring policy and an asynchronous mirroring policy according to a real-time requirement for data, for example, in this example, the mirroring policy is synchronous mirroring because the real-time requirement for data is strong.
The mirror record O can be obtained by storing the data node through the above process: o ═ LogicName, OperType, mirrorstrand }.
Step 13, analyzing the data block number for storing the data according to the file area information of the logical file name in the mirror image record, and determining whether data mirroring needs to be performed on the B cluster by judging whether the data block number is in the mirror image data area;
in the cluster A, mirror image data areas are preset, for example, the mirror image areas mirrored to the cluster B are data block numbers 0, 1, 3, 5, 7 and 9, and data files on the data blocks in the area range need to be mirrored to other data center cluster B; and the non-mirror data areas, for example, the non-mirror areas are data block numbers 2, 4, 6, 8, 10, and the data files in the area range only exist in the local cluster, and do not need to be mirrored.
That is, through step 13, it can be determined from the file region information which one does not need mirroring and which one the distributed file system for mirroring purposes is. The mirror image records are filtered and deleted, and when the mirror image is determined not to be needed, the mirror image records can be deleted.
In this example, two DFS systems are introduced for mirroring, and in practice, we can select multiple DFS systems for mirroring. For example, the distributed mirroring system comprises A, B, C and D four DFS systems, and a mirroring data area is preset, for example, the mirroring area mirrored to the B system is 0-10 data blocks, the mirroring area mirrored to the C system is 11-20, and the mirroring area mirrored to the D system is 21-30. And judging which area to mirror to which DFS system according to the data block number.
In this example, for example, if the data block number is 5, data mirroring needs to be performed on the B cluster;
step 14, when the mirror image strategy is judged to be synchronous mirror image;
step 15, acquiring a target storage data node address list to be mirrored to a mirror target distributed file system determined according to the file area information, namely a main control node of a B cluster;
the storage data node knows the main control node address of the mirror cluster B of the cluster A by reading the configuration file, after the main control node address is obtained, the cluster A initiates a request to the main control node of the mirror cluster B through the network to tell the main control node that the data needs to be synchronized, after the main control node receives the request, the main control node inquires which storage data nodes can be used for writing data currently, namely inquires which storage data nodes have a data block number of 5, if not, a plurality of storage data nodes with data blocks 5 are created (the data block number of the storage data node of the mirror image purpose and the data block number of the storage data node can be correspondingly set by adopting the prior art, and are not repeated here), then the main control node of the cluster B system distributes a target storage data node list with writable data to the storage data nodes of the cluster A, namely returns the target storage data node address to be mirrored, the destination storage data node address returned here is plural, for example, 2;
when allocating writable storage data nodes, the master node follows the following principle: according to each data storage request, the main control node uniformly distributes data storage to different data storage nodes, and the main control node can perform data migration according to the capacity of each data storage node, so that the balance of the use capacity of the data storage nodes is achieved.
Step 16, the storage data node acquires the data according to the file area information in the logic file name;
specifically, the storage data node analyzes the data block number on the storage data node stored in the data according to the file region information in the logical file name, and takes out the data from the data block with the data block number of 5;
and step 17, storing and backing up the data on the destination data storage node corresponding to the destination data storage node address in the list according to the operation type in the mirror image record.
Specifically, for example, if the mirroring policy is synchronous mirroring, the storage data node selects a destination data storage node from the returned destination data storage addresses, writes data into the data storage node, and after the destination data storage node completes data writing, forwards the data to the remaining other data storage nodes in the returned list for backup, thereby completing mirroring of the data in the B cluster system.
That is, when the mirroring policy is a synchronous mirroring policy and the operation type is write, step 17 may include the steps of:
step 171, selecting a destination data storage node address from the destination data storage node address list, and writing the data into a destination data storage node corresponding to the selected destination data storage node address;
step 172, the selected destination data storage node forwards the data to destination data storage nodes corresponding to the addresses of the remaining destination data storage nodes in the destination data storage node address list;
step 173, the destination data storage nodes corresponding to the addresses of the remaining destination data storage nodes store the data.
The step 17 (steps 171, 172 and 173) also utilizes the characteristic of a single DFS system to backup data storage, and realizes multiple backups of data in mirroring. Therefore, when one storage data node in the mirrored cluster system goes down, the security of the mirrored data cannot be affected.
Therefore, the two DFS systems with completely consistent functions are used as the mirror image system, so that even if any one storage data node goes down, the normal operation of the mirror image system can be ensured due to the plurality of storage data nodes, and the safety of the data can be ensured due to the storage backup and the mirror image backup of the data, so that the high reliability of the mirror image system is realized.
In addition, as shown in fig. 3, it is a flowchart of another embodiment of the distributed data mirroring method of the present application. In this embodiment, when the mirroring policy is an asynchronous mirroring policy, that is, because the real-time requirement of the data is not high, the cluster a may set its mirroring policy as an asynchronous mirroring policy, where the specific flow is different from that in the above embodiment in that step 14 is to determine that the mirroring policy is an asynchronous mirror; also included between step 14 and step 15 are:
step 31, the storage data node pushes the mirror image record into a file queue;
and step 32, the storage data node checks the mirror image records in the file queue in real time, and when the mirror image records exist, the mirror image records are sequentially taken out from the file queue.
In the embodiment of the distributed data mirroring method in which the mirroring policy is an asynchronous mirroring policy, the steps are the same except for the difference between the above steps. When the mirror image strategy is asynchronous mirror image, firstly putting the mirror image record into the file queue, when the real-time mirror image is not needed, the storage data node can check whether the mirror image record exists in the file queue unit in real time by adopting the mode, and when the mirror image record exists, the mirror image record is taken out and the mirror image is carried out on the storage data nodes of other data centers in sequence.
In addition, according to the security of data stored on the storage data nodes of the cluster A, a manager can set a plurality of storage data nodes in the cluster B to be mirrored through the configuration file. Therefore, the main control node of the B cluster directly returns the set address list of the data storage nodes to the data storage nodes of the A cluster, and does not inquire the data storage nodes containing the same data block number.
The method adopts a plurality of DFS systems as mirror image systems, uses storage data nodes in the DFS systems as main data main servers, and mirrors any storage data node of the DFS systems of a plurality of data centers as backups. In the application, a plurality of backups are made on the cluster A of the data, and the storage and the plurality of backups are made on the storage data nodes of the target mirror image cluster B of the data, so that the data cannot be influenced after one storage data node is down or one data block is damaged. And the DFS system is adopted as a mirror image system, one storage data node is down or a data block is damaged, and the service cannot be stopped. When the performance is in a bottleneck, capacity expansion can be performed according to performance requirements, and only a machine disk (data storage node) is added, and online capacity expansion can be performed without stopping service.
Fig. 4 is a flowchart illustrating a distributed data mirroring method according to another embodiment of the present application. In step 17, when the storage data node performs an operation on the destination data storage node corresponding to the destination data storage node address according to the operation type in the mirror record according to the mirror policy, the method further includes: and sending the mirror image record comprising the abstract of the data to a target data storage node corresponding to the address of the selected target data storage node. Thus, in addition to the steps described in the synchronous mirroring embodiment and the asynchronous mirroring embodiment described above, there are also included:
the selected destination data storage node performs the steps of:
step 22, receiving a mirror record comprising a summary of said data;
step 23, the destination data storage node calculates the abstract of the data according to the data; the method comprises the steps that algorithm storage data nodes and mirror image target storage data nodes adopted by abstract calculation are configured in advance in a unified mode;
step 24, comparing the abstract in the mirror image record with the calculated abstract;
and step 25, if the two are consistent, the mirror image is successful.
And step 26, if the data is inconsistent with the data, returning failure to the data storage node, and carrying out data mirroring again.
Although the execution subject of these steps is the destination data storage node, all the storage data nodes can execute the steps since each storage data node of the DFS in the distributed mirroring system will act as the destination storage data node. The destination data storage node performs the above steps to ensure consistency of the mirrored data. Therefore, the distributed data mirroring method also realizes a quick and efficient data verification method.
The above-described embodiments illustrate a mirroring method in which a user performs data writing through a client, and in addition to the data writing, there are deletion of data and update of data. The mirroring process of the update of the data is the same as the newly added write, and is not described herein again.
The mirroring process of data deletion is briefly described below.
Fig. 5 is a flowchart illustrating a distributed data mirroring method according to the present application. This example illustrates a user requesting deletion of data for a logical file name, which is similar to the steps of the embodiment of FIG. 1, except that step 16 is not performed as in the embodiment of FIG. 1.
When deleting certain logical file name data, the method comprises the following steps:
step 51, the user sends data for deleting a certain logical file name to a main control node of a DFS system through the client according to the recorded corresponding relation between the logical file name and the main control node, for example, the logical file name is Plabcdhijklmnjkl;
step 52, the main control node inquires the storage data node where the data is located according to the logical file name, and returns a storage data node list to the user through the client;
step 53, the user selects a storage data node address, and sends a data deletion request (i.e. a data operation request) to the storage data node corresponding to the address;
step 54, the storage data node receives the data deletion request, deletes the data thereon, and forwards the data deletion request to other storage data nodes in the list, deletes the data thereon;
step 55, after the data is deleted successfully, storing a data node to generate a mirror image record, wherein the mirror image record comprises a logical file name, a mirror image strategy and an operation type of the data; wherein the logical file name includes file region information BS;
as in the above embodiment of fig. 1, when deleting, the storage data node of the cluster a generates a mirror record;
also after step 55, the same steps 13, 14 and 15 as in the embodiment of fig. 1 are performed, and then the following steps are performed:
and 56, deleting the data on the destination data storage node corresponding to the destination data storage node address according to the operation type in the mirror image record.
Assuming that the mirroring policy is synchronous mirroring, that is, after the storage data node performs the above steps, in real time:
step 561, the storage data node of the A cluster selects a destination data storage node address from the destination data storage node address list, and the destination data storage node corresponding to the selected destination data storage node address deletes the data;
step 562, the selected destination data storage node forwards the delete data request to the destination data storage nodes corresponding to the addresses of the remaining destination data storage nodes in the destination data storage node address list;
step 563, the destination data storage nodes corresponding to the addresses of the remaining destination data storage nodes delete the data.
When the storage data node executes the delete mirror, the same as the write operation performed in the embodiment of fig. 2, and therefore, the steps executed under the asynchronous mirror may also be included.
The present application further provides a storage data node for implementing the distributed data mirroring method, where the storage data node includes:
the request processing unit is used for receiving a data operation request, performing operation corresponding to the data operation request on data and generating a logic file name of the data;
the generating unit is used for generating a mirror image record comprising a logical file name, a mirror image strategy and an operation type of the data, wherein the logical file name comprises file area information;
the mirror image address acquisition unit is used for acquiring a list of destination data storage node addresses to be mirrored from a master control node of a mirror image destination distributed file system determined by the file region information;
and the data mirroring unit is used for performing corresponding operation of the data operation request on the destination data storage node corresponding to the destination data storage node address of the data in the list according to the mirroring strategy and the operation type in the mirroring record.
Further, the data storage node of the present application further includes:
the mirror image determining unit is used for analyzing the data block number of the data block storing the data according to the file region information and judging whether the data block number belongs to a mirror image data region; if the address belongs to the target address, the mirror address acquisition unit is notified.
In addition, the data storage node of the present application may further include:
the pushing unit is used for pushing the mirror image record into a file queue;
the file queue unit is used for storing the mirror image record;
and the taking-out unit is used for taking out the mirror image records from the file queue in sequence.
When the mirror image strategy is asynchronous mirror image, firstly recording and playing the mirror image into the file queue unit, when the real-time mirror image is not needed, the data mirror image unit of the storage data node can check whether the file queue unit has mirror image records at regular time, if so, the taking-out unit takes out the mirror image records in sequence, and the data mirror image unit carries out data mirror image according to the mirror image records.
Preferably, the data storage node of the present application further includes:
and the mirror image record sending unit is used for sending the mirror image record comprising the abstract of the data to the destination data storage node corresponding to the selected destination data storage node address.
The application shukudi data node still includes:
a receiving unit for receiving a mirror record including a summary of the data;
a calculation unit for calculating a summary of the data;
a comparing unit, configured to compare the digest in the mirror image record with the calculated digest;
and the mirror image success marking unit is used for marking the mirror image success if the mirror image success is consistent.
Since each storage data node of the DFS in the distributed mirroring system serves as a destination storage data node, all storage data nodes may include the above units. The destination data storage node is used for ensuring the consistency of the mirrored data. Therefore, the data storage node also realizes a quick and efficient data verification method.
The method adopts a plurality of DFS systems as mirror image systems, uses storage data nodes in the DFS systems as main data main servers, and mirrors any storage data node of the DFS systems of a plurality of data centers as backups. In the application, a plurality of backups are made on the cluster A of the data, and the storage and the plurality of backups are made on the storage data nodes of the target mirror image cluster B of the data, so that the data cannot be influenced after one storage data node is down or one data block is damaged. And the DFS system is adopted as a mirror image system, one storage data node is down or a data block is damaged, and the service cannot be stopped. When the performance is in a bottleneck, capacity expansion can be performed according to performance requirements, and only a machine disk (data storage node) is added, and online capacity expansion can be performed without stopping service.
It will be further appreciated by those of ordinary skill in the art that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments, objects, technical solutions and advantages of the present application are described in further detail, it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present application should be included in the scope of the present application.

Claims (10)

1. A distributed data mirroring method, comprising the steps of:
a storage data node of a distributed file system receives a data operation request, performs an operation corresponding to the data operation request on data, performs a backup operation corresponding to the data operation request on the data, and generates a logical file name of the data, wherein the logical file name includes file region information;
generating a mirror image record comprising a logical file name, a mirror image strategy and an operation type of the data;
acquiring a list of destination data storage node addresses to be mirrored from a master control node of a mirror destination distributed file system determined by the file region information;
and according to the mirror image strategy, according to the operation type in the mirror image record, performing corresponding operation and backup on the data on a target data storage node corresponding to the target data storage node address in the list.
2. The distributed data mirroring method according to claim 1, wherein after the mirroring record including the logical file name, the mirroring policy, and the operation type of the data is generated, the master node of the mirroring destination distributed file system determined to the file region information is before acquiring a list of destination data storage node addresses to be mirrored; further comprising:
analyzing a data block number of a data block storing the data according to the file region information, and judging whether the data block number belongs to a mirror image data region; and if the file area information belongs to the target data storage node address list, executing the main control node of the mirror image target distributed file system determined by the file area information, and acquiring the target data storage node address list to be mirrored.
3. The distributed data mirroring method according to claim 1 or 2, before the receiving a data operation request, performing an operation corresponding to the data operation request on data, and generating a logical file name of the data, further comprising:
a main control node in a distributed file system where the data storage node is located receives information input by a user through a client and used for operating the data;
the master control node returns a stored data node address list to the client according to the input information;
and selecting the storage data node from the storage data node address list by the user through the client.
4. The distributed data mirroring method of claim 1, wherein when the data operation request is a data storage request, the operation type is a storage operation; when the mirroring policy is synchronous mirroring, after the master control node of the mirroring destination distributed file system determined to the file area information obtains a destination data storage node address list to be mirrored, and before performing corresponding operation of the data operation request on the destination data storage node corresponding to the destination data storage node address for the data according to the mirroring policy and the operation type in the mirroring record, the method further includes:
and acquiring the data according to the file area information in the logic file name.
5. The distributed data mirroring method according to claim 4, wherein the operating the data on the destination data storage node corresponding to the destination data storage node address according to the mirroring policy and the operation type in the mirroring record comprises:
selecting a destination data storage node address from the destination data storage node address list, and writing the data into a destination data storage node corresponding to the selected destination data storage node address;
the selected destination data storage node forwards the data to destination data storage nodes corresponding to the addresses of the rest destination data storage nodes in the destination data storage node address list;
and the destination data storage nodes corresponding to the addresses of the rest destination data storage nodes store the data.
6. The distributed data mirroring method according to claim 1, wherein when the operation type is delete, performing an operation on the data on a destination data storage node corresponding to the destination data storage node address according to the mirroring policy and according to the operation type in the mirroring record includes:
selecting a destination data storage node address from the destination data storage node address list, and deleting the data from a destination data storage node corresponding to the selected destination data storage node address;
the selected target data storage node forwards the data deleting request to the target data storage nodes corresponding to the addresses of the rest target data storage nodes in the address list of the target data storage nodes;
and deleting the data by the destination data storage nodes corresponding to the addresses of the rest destination data storage nodes.
7. The distributed data mirroring method according to claim 5 or 6, wherein before the master node of the mirror destination distributed file system determined by the file region information acquires an address of a destination data storage node to be mirrored, and after generating a mirror record including a logical file name, a mirror policy, and an operation type of the data, the method further comprises:
judging whether the mirror image strategy is an asynchronous strategy, if so, executing:
pushing the mirror image record into a file queue;
and checking the mirror image records in the file queue, and taking out the mirror image records from the file queue in sequence.
8. The distributed data mirroring method according to claim 5 or 6, wherein, while performing an operation on the data on a destination data storage node corresponding to the destination data storage node address according to the mirroring policy and according to the operation type in the mirroring record, the method further comprises:
and sending the mirror image record comprising the abstract of the data to the destination data storage node corresponding to the selected destination data storage node address.
9. The distributed data mirroring method of claim 8, further comprising:
receiving a mirror image record comprising an abstract of the data by the destination data storage node corresponding to the selected destination data storage node address;
calculating a summary of the data;
comparing the summary in the mirror record with the calculated summary;
if so, the mirror is successful.
10. A storage data node, comprising:
the request processing unit is used for receiving a data operation request, performing operation corresponding to the data operation request on data, performing backup operation corresponding to the data operation request on the data, and generating a logical file name of the data;
the generating unit is used for generating a mirror image record comprising a logical file name, a mirror image strategy and an operation type of the data, wherein the logical file name comprises file area information;
the mirror image address acquisition unit is used for acquiring a list of destination data storage node addresses to be mirrored from a master control node of a mirror image destination distributed file system determined by the file region information;
and the data mirroring unit is used for performing corresponding operation and backup of the data operation request on the destination data storage node corresponding to the destination data storage node address in the list according to the mirroring strategy and the operation type in the mirroring record.
HK13110052.9A 2013-08-28 Distributed data mirroring method and data store nodes HK1182804B (en)

Publications (2)

Publication Number Publication Date
HK1182804A HK1182804A (en) 2013-12-06
HK1182804B true HK1182804B (en) 2017-12-29

Family

ID=

Similar Documents

Publication Publication Date Title
JP5254611B2 (en) Metadata management for fixed content distributed data storage
JP5539683B2 (en) Scalable secondary storage system and method
US20210004355A1 (en) Distributed storage system, distributed storage system control method, and storage medium
JP4317876B2 (en) Redundant data allocation in data storage systems
US8229897B2 (en) Restoring a file to its proper storage tier in an information lifecycle management environment
CN106843749B (en) Write request processing method, device and device
US9996421B2 (en) Data storage method, data storage apparatus, and storage device
JP2021509989A (en) Resource reservation method, resource reservation device, resource reservation device, and resource reservation system
CN104184812B (en) A kind of multipoint data transmission method based on private clound
WO2017119091A1 (en) Distrubuted storage system, data storage method, and software program
JPWO2008114441A1 (en) Storage management program, storage management method, and storage management device
CN109254958B (en) Distributed data reading and writing method, device and system
US9984139B1 (en) Publish session framework for datastore operation records
JP2013544386A5 (en)
CN103186554A (en) Distributed data mirroring method and data storage node
JP2017504880A (en) System and method for supporting persistent partition recovery in a distributed data grid
CN105404565B (en) Double-live data protection method and device
JP2014044553A (en) Program, information processing device, and information processing system
CN106873902B (en) File storage system, data scheduling method and data node
US11188258B2 (en) Distributed storage system
CN115955488B (en) Distributed storage copy cross-machine room placement method and device based on copy redundancy
US12182289B2 (en) Fencing off cluster services based on access keys for shared storage
US20150135004A1 (en) Data allocation method and information processing system
CN112748865B (en) Method, electronic device and computer program product for storage management
CN116389233B (en) Container cloud management platform active-standby switching system, method and device and computer equipment