Background
An existing storage system generally includes a management node and storage nodes, where the storage nodes are physical media for storing data, and the management node manages data stored in each storage node in the system.
In some storage systems with a small amount of stored data, an independent management node is not usually provided, but one or a few storage nodes are selected as managers, and the managers manage the data stored in each storage node in the storage system.
In the above scheme, the number of managers is small, and if all the managers have faults, the storage system is broken down and cannot continue to provide storage services, that is, the reliability of the storage system is poor.
Disclosure of Invention
The embodiment of the invention aims to provide a storage system and a storage node management method, which are used for improving the reliability of the storage system.
In order to achieve the above object, an embodiment of the present invention discloses a storage system, which includes at least two storage nodes, where each storage node runs a management service; the at least two storage nodes comprise a first storage node with role information as a manager and a second storage node with role information not as the manager;
the second storage node is used for judging whether the role information of the second storage node is a manager or not through an election mechanism under management service running in each storage node after the first storage node is detected to have a fault; if yes, judging whether the self is configured with address information for accessing the storage system, and if not, acquiring and configuring the address information.
Optionally, the second storage node may be further configured to determine, through the election mechanism, whether the second storage node is a temporary administrator:
if so, initiating election, determining the role information of each storage node in the storage system according to voting information sent by each election participant, and identifying the role information of each storage node by using management services operated in each storage node so that each storage node determines the role information of the storage node according to the identification;
if not, judging whether the user is an election participant, if so, sending voting information to the temporary manager after the temporary manager initiates election.
Optionally, the second storage node may be further configured to determine whether the second storage node is configured with address information for accessing the storage system when it is determined that the own role information is not a manager, and delete the address information if the second storage node is configured.
Optionally, the second storage node may be further configured to receive, through a management service running on the second storage node, fault notification information indicating that the first storage node has a fault;
and the second storage node can also be used for judging whether the role information of the second storage node is a manager or not according to the role information identified in the management service operated by the second storage node after the election is finished.
Optionally, the second storage node may be further configured to receive data change information through a management service executed by the second storage node; and updating the database stored by the database according to the data change information.
Optionally, the database stores index information of data stored in the storage system and node information of each storage node in the storage system;
the second storage node may be further configured to update index information in the database according to index information of changed data included in the data change information; and updating the node information in the database according to the node information of the change storage node contained in the data change information.
Optionally, the database stores index information of data stored in the storage system and node information of each storage node in the storage system;
the second storage node may be further configured to read index information of the storage data of each storage node and node information of each storage node;
comparing the read index information with the index information in the database to obtain a first comparison result;
comparing the read node information with the node information in the database to obtain a second comparison result;
and updating the database according to the first comparison result and the second comparison result.
Optionally, the database further stores a first version number corresponding to each piece of index information and a second version number corresponding to each piece of node information;
the second storage node may be further configured to update, according to the first comparison result, the index information in the database and the first version number corresponding to the index information;
and updating the node information in the database and a second version number corresponding to the node information according to the second comparison result.
In order to achieve the above object, the embodiment of the present invention further discloses a storage node management method, which is applied to storage nodes in a storage system, where management services are run in each storage node of the storage system; the method comprises the following steps:
after detecting that a storage node with the role information as a manager fails, judging whether the own role information is the manager or not through an election mechanism under management service running in each storage node;
if yes, judging whether the self is configured with address information for accessing the storage system, and if not, acquiring and configuring the address information.
Optionally, after detecting that the storage node whose role information is the manager fails, the method may further include:
judging whether the election mechanism is a temporary manager or not through the election mechanism:
if so, initiating election, determining the role information of each storage node in the storage system according to the voting information sent by each election participant, and identifying the role information of each storage node by using the management service operated in each storage node;
if not, judging whether the user is an election participant, if so, sending voting information to the temporary manager after the temporary manager initiates election.
Optionally, in a case that it is determined that the own role information is not the administrator, the method further includes:
judging whether the self is configured with address information for accessing the storage system, and if so, deleting the address information.
Optionally, a management service is run in a storage node in the storage system; the step of detecting that the storage node whose role information is the manager fails may include:
receiving fault prompt information through a management service operated by the management node, wherein the fault prompt information indicates that a storage node with role information as a manager has a fault;
the step of determining whether the role information of the storage node is a manager or not through an election mechanism under the management service running in each storage node may include:
and after the election is finished, judging whether the role information of the user is a manager or not according to the role information identified in the management service operated by the user.
Optionally, the method may further include:
receiving data change information through a management service operated by the management server;
and updating the database stored by the database according to the data change information.
Optionally, the database stores index information of data stored in the storage system and node information of each storage node in the storage system;
when the data change information includes index information of changed data, the updating the database may include:
updating the index information in the database according to the index information of the changed data contained in the data change information;
in a case where the data change information includes node information of a change storage node, the updating of the database stored in the update storage node may include:
and updating the node information in the database according to the node information of the change storage node contained in the data change information.
Optionally, the database stores index information of data stored in the storage system and node information of each storage node in the storage system; the step of updating the database stored in the database comprises the following steps:
reading index information of the storage data of each storage node and node information of each storage node;
comparing the read index information with the index information in the database to obtain a first comparison result;
comparing the read node information with the node information in the database to obtain a second comparison result;
and updating the database according to the first comparison result and the second comparison result.
Optionally, the database further stores a first version number corresponding to each piece of index information and a second version number corresponding to each piece of node information, and the step of updating the database according to the first comparison result and the second comparison result may include:
updating the index information in the database and the corresponding first version number according to the first comparison result;
and updating the node information in the database and a second version number corresponding to the node information according to the second comparison result.
Therefore, by applying the embodiment of the invention, after the storage node as the manager fails, each storage node is possible to become a new manager by using an election mechanism under the management service operated in each storage node, the new manager configures the address information for accessing the storage system, and the storage system can continue to provide the storage service; therefore, the reliability of the memory system is improved.
Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve the above technical problem, an embodiment of the present invention provides a storage system and a storage node management method, where the method may be applied to a storage node in the storage system. First, a detailed description will be given of a storage system according to an embodiment of the present invention.
The storage system may be as shown in fig. 1, comprising at least two storage nodes: the storage node 1 and the storage node 2 … … may be the storage node N, or may include only the storage node 1 and the storage node 2, which is not particularly limited. Running management service in each storage node; the at least two storage nodes comprise a first storage node with role information as a manager and a second storage node with role information not as the manager.
The second storage node may be configured to determine whether role information of the second storage node is a manager through an election mechanism under a management service running in each storage node after detecting that the first storage node has a fault; if yes, judging whether the self is configured with address information for accessing the storage system, and if not, acquiring and configuring the address information.
It should be noted that the management services running in the storage nodes may communicate with each other, for example, the management services may include a zookeeper service, and each storage node may detect that a storage node whose role information is a manager fails through an event notification mechanism of the zookeeper service.
Specifically, the second storage node may receive the failure prompt information through a zookeeper service operated by the second storage node, and if the storage node receives the failure prompt information, the storage node (the first storage node) whose role information is a manager fails.
In this case, a new administrator may be elected through an election mechanism under the zookeeper service. As an embodiment, the second storage node may determine whether itself is a temporary administrator through the election mechanism:
if so, initiating election, determining the role information of each storage node in the storage system according to the voting information sent by each election participant, and identifying the role information of each storage node by using the management service operated in each storage node;
if not, judging whether the user is an election participant, if so, sending voting information to the temporary manager after the temporary manager initiates election.
Specifically, through the zookeeper service, a temporary administrator may be created that may initiate elections. The temporary administrator may designate other storage nodes as election participants: all storage nodes except the storage nodes can be designated as election participants, odd storage nodes can be designated as election participants at will, and the method is not limited specifically.
The storage nodes designated as election participants send voting information to the temporary administrator. And the temporary manager determines the role information of each storage node (namely determines a manager and a non-manager) according to the received voting information sent by each election participant. And the temporary manager identifies the role information of each storage node by using the zookeeper service operated in each storage node. Therefore, after election is finished, each storage node can judge whether own role information is a manager or not according to the role information identified in the zookeeper service operated by the storage node.
Similarly, the second storage node may determine, after the election is finished, whether its role information is a manager, if so, determine whether it is configured with address information for accessing the storage system, and if not, acquire and configure the address information.
In this embodiment, if the second storage node determines that the own role information is not the administrator, the second storage node may determine whether the second storage node is configured with address information for accessing the storage system, and if the second storage node is configured, delete the address information.
The address information may be understood as a virtual IP address, that is, a virtual IP address provided by the storage system to the outside, through which the user can access the storage system. Only the manager configures the virtual IP address in the storage system, so if the original manager fails, a storage node becomes a new manager, and the new manager should configure the address information. Specifically, the address information may be directly obtained through a command line.
In addition, if a non-administrator configures the address information, the storage system may have an unknown error, and thus, when the storage node determines that the own role information is not an administrator and configures the address information, the storage node should delete the address information from itself.
In an embodiment of the present invention, each storage node may store a database, and the storage node may update the database stored in the storage node itself after receiving the data change information through a management service running in the storage node itself.
Specifically, if a storage node in the storage system has data change, the storage node may notify the data change event to each storage node through an event notification mechanism of the zookeeper service. Each storage node can receive data change information through a zookeeper service operated by the storage node.
The data change may include a change of index information and may also include a change of node information. For example, when data is added, deleted or moved in the storage system, the index information of the data is changed; when a storage node in the storage system is newly added or deleted, the node information is changed. The database may include both index information and node information.
In one embodiment, the data change information may include index information of the changed data, so that the index information in the database may be updated based on the index information of the changed data included in the data change information.
In one embodiment, the data change information may include node information of the change storage node, so that the node information in the database may be updated based on the node information of the change storage node included in the data change information.
In one embodiment, the data change information may only play a role of presentation without including index information or node information. After receiving the data change information, the storage nodes read the index information of the data stored in each storage node and the node information of each storage node; comparing the read index information with index information in a database to obtain a first comparison result; comparing the read node information with node information in a database to obtain a second comparison result; and updating the database according to the first comparison result and the second comparison result.
For convenience of description, the following contents collectively refer to index information and node information as records.
Specifically, the record in each storage node may be compared (assumed to be a) with the record B of its own database, and if some content a1 exists only in a but not in B, a1 may be inserted into its own database.
If there is some content B1 that exists only in B and not in A, and there is no B1 in the children of the storage node to which B1 corresponds, then B1 may be deleted from the local database.
If some content B2 exists only in the storage node corresponding to B2, and the child node does not exist under the storage node corresponding to B2, the data in A can be considered lost, and in this case, the record in the storage node corresponding to B2 can be updated.
In this embodiment, a first version number corresponding to each piece of index information and a second version number corresponding to each piece of node information may be further stored in the database, so that the index information and the corresponding first version number in the database may be updated according to the first comparison result; and updating the node information in the database and a second version number corresponding to the node information according to the second comparison result.
Specifically, each record is updated once, and its corresponding version number may be increased by 1.
Since the version number of each record is stored, in the process of updating the records in the storage node corresponding to B2, the record with the maximum version number, that is, the latest record, can be determined from the records in each storage node, and the record in the storage node corresponding to B2 is updated according to the latest record. In addition, other storage nodes may update their own databases based on the latest records.
It should be noted that, if the storage node goes online again after a failure occurs, the database of the storage node may also be updated according to the database of each storage node in the storage system.
One specific embodiment is described below:
management services are operated on all storage nodes in the storage system, and the management services comprise zookeeper services. When a manager (a first storage node) in the storage system fails, other storage nodes (a second storage node) detect that the manager fails through an event notification mechanism of the zookeeper service.
Through the election mechanism under the zookeeper service, a new administrator can be elected. Specifically, through the zookeeper service, a temporary administrator may be created that may initiate elections. The temporary administrator may designate other storage nodes as election participants, specifically, may designate all storage nodes except itself as election participants, may designate odd number of storage nodes as election participants at will, and is not limited specifically.
The storage nodes designated as election participants send voting information to the temporary administrator. And the temporary manager determines the role information of each storage node (namely determines a manager and a non-manager) according to the received voting information sent by each election participant. And the temporary manager identifies the role information of each storage node by using the zookeeper service operated in each storage node. Therefore, each storage node can judge whether the own role information is a manager according to the role information identified in the zookeeper service operated by the storage node.
The storage node that becomes the new administrator should be configured with the address information so that the storage system can provide management services to the user. Specifically, the address information may be directly obtained through a command line.
In addition, if a non-administrator configures the address information, the storage system may have an unknown error, and thus, when the storage node determines that the own role information is not an administrator and configures the address information, the storage node should delete the address information from itself.
It should be noted that the storage system provided with the management node may also apply the scheme: the system can have two working modes, wherein in the first working mode, the management node and the storage node operate normally, in the second working mode, the management node does not operate or operates as the storage node, and a manager in the storage node performs data management. The system can be switched between two operating modes.
Specifically, the management service running in the storage node may determine the current working mode of the storage system according to the configuration file, and if the current working mode is the second working mode, the present scheme may be executed. That is, the storage system provided with the management node can flexibly select whether to execute the scheme according to the actual situation.
By applying the embodiment of the invention, after the storage node as the manager fails, each storage node is possible to become a new manager by utilizing an election mechanism under the management service operated in each storage node, the new manager configures the address information for accessing the storage system, and the storage system can continue to provide the storage service; therefore, the reliability of the memory system is improved.
Fig. 2 is a schematic flow chart of a storage node management method according to an embodiment of the present invention, where the embodiment of the present invention shown in fig. 2 may be applied to a storage node in a storage system, and management services are run in each storage node in the storage system; the embodiment of the present invention shown in fig. 2 may also be applied to the second storage node in the embodiment of the present invention shown in fig. 1. Fig. 2 includes:
s201: after detecting that a storage node with the role information as a manager fails, judging whether the own role information is the manager or not through an election mechanism under management service running in each storage node; if so, S202 is performed.
As an embodiment, management services may be run in storage nodes in the storage system, and the management services run in the storage nodes may communicate with each other, for example, the management services may include a zookeeper service, and each storage node may detect that a storage node whose role information is a manager fails through an event notification mechanism of the zookeeper service.
Specifically, the storage node may receive the failure prompt information through a zookeeper service operated by the storage node, and if the storage node receives the failure prompt information, the storage node whose role information is a manager fails.
In this case, a new administrator may be elected through an election mechanism under the zookeeper service. As an implementation manner, the storage node may determine whether itself is a temporary administrator through the election mechanism:
if so, initiating election, determining the role information of each storage node in the storage system according to the voting information sent by each election participant, and identifying the role information of each storage node by using the management service operated in each storage node;
if not, judging whether the user is an election participant, if so, sending voting information to the temporary manager after the temporary manager initiates election.
Specifically, through the zookeeper service, a temporary administrator may be created that may initiate elections. The temporary administrator may designate other storage nodes as election participants, specifically, may designate all storage nodes except itself as election participants, may designate odd number of storage nodes as election participants at will, and is not limited specifically.
The storage nodes designated as election participants send voting information to the temporary administrator. And the temporary manager determines the role information of each storage node (namely determines a manager and a non-manager) according to the received voting information sent by each election participant. And the temporary manager identifies the role information of each storage node by using the zookeeper service operated in each storage node. Therefore, after election is finished, each storage node can judge whether own role information is a manager or not according to the role information identified in the zookeeper service operated by the storage node.
S202: and judging whether the self is configured with address information for accessing the storage system, if not, executing S203.
S203: and acquiring and configuring the address information.
In this embodiment, if the storage node determines that the own role information is not the administrator, the storage node may determine whether the storage node is configured with address information for accessing the storage system, and if the storage node is configured, delete the address information.
The address information may be understood as a virtual IP address, that is, a virtual IP address provided by the storage system to the outside, through which the user can access the storage system. Only the manager configures the virtual IP address in the storage system, so if the original manager fails, a storage node becomes a new manager, and the new manager should configure the address information. Specifically, the address information may be directly obtained through a command line.
In addition, if a non-administrator configures the address information, the storage system may have an unknown error, and thus, when the storage node determines that the own role information is not an administrator and configures the address information, the storage node should delete the address information from itself.
By applying the embodiment shown in fig. 2 of the present invention, after a storage node as a manager fails, each storage node may become a new manager by using an election mechanism under a management service running in each storage node, and the new manager configures address information for accessing the storage system, so that the storage system can continue to provide the storage service; therefore, the reliability of the memory system is improved.
In an embodiment of the present invention, each storage node may store a database, and the storage node may update the database stored in the storage node itself after receiving the data change information through a management service running in the storage node itself.
Specifically, if a storage node in the storage system has data change, the storage node may notify the data change event to each storage node through an event notification mechanism of the zookeeper service. Each storage node can receive data change information through a zookeeper service operated by the storage node.
The data change may include a change of index information and may also include a change of node information. For example, when data is added, deleted or moved in the storage system, the index information of the data is changed; when a storage node in the storage system is newly added or deleted, the node information is changed. The database may include both index information and node information.
In one embodiment, the data change information may include index information of the changed data, so that the index information in the database may be updated based on the index information of the changed data included in the data change information.
In one embodiment, the data change information may include node information of the change storage node, so that the node information in the database may be updated based on the node information of the change storage node included in the data change information.
In one embodiment, the data change information may only play a role of presentation without including index information or node information. After receiving the data change information, the storage nodes read the index information of the data stored in each storage node and the node information of each storage node; comparing the read index information with index information in a database to obtain a first comparison result; comparing the read node information with node information in a database to obtain a second comparison result; and updating the database according to the first comparison result and the second comparison result.
For convenience of description, the following contents collectively refer to index information and node information as records.
Specifically, the record in each storage node may be compared (assumed to be a) with the record B of its own database, and if some content a1 exists only in a but not in B, a1 may be inserted into its own database.
If there is some content B1 that exists only in B and not in A, and there is no B1 in the children of the storage node to which B1 corresponds, then B1 may be deleted from the local database.
If some content B2 exists only in the storage node corresponding to B2, and the child node does not exist under the storage node corresponding to B2, the data in A can be considered lost, and in this case, the record in the storage node corresponding to B2 can be updated.
In this embodiment, a first version number corresponding to each piece of index information and a second version number corresponding to each piece of node information may be further stored in the database, so that the index information and the corresponding first version number in the database may be updated according to the first comparison result; and updating the node information in the database and a second version number corresponding to the node information according to the second comparison result.
Specifically, each record is updated once, and its corresponding version number may be increased by 1.
Since the version number of each record is stored, in the process of updating the records in the storage node corresponding to B2, the record with the maximum version number, that is, the latest record, can be determined from the records in each storage node, and the record in the storage node corresponding to B2 is updated according to the latest record. In addition, other storage nodes may update their own databases based on the latest records.
It should be noted that, if the storage node goes online again after a failure occurs, the database of the storage node may also be updated according to the database of each storage node in the storage system.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those skilled in the art will appreciate that all or part of the steps in the above method embodiments may be implemented by a program to instruct relevant hardware to perform the steps, and the program may be stored in a computer-readable storage medium, which is referred to herein as a storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.