CN102868754B - A kind of realize the method for cluster-based storage high availability, node apparatus and system - Google Patents
A kind of realize the method for cluster-based storage high availability, node apparatus and system Download PDFInfo
- Publication number
- CN102868754B CN102868754B CN201210363576.5A CN201210363576A CN102868754B CN 102868754 B CN102868754 B CN 102868754B CN 201210363576 A CN201210363576 A CN 201210363576A CN 102868754 B CN102868754 B CN 102868754B
- Authority
- CN
- China
- Prior art keywords
- node
- information
- storage
- cluster
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000012544 monitoring process Methods 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 15
- 238000004891 communication Methods 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 9
- 230000002159 abnormal effect Effects 0.000 claims description 8
- 241001522296 Erithacus rubecula Species 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 6
- 230000001960 triggered effect Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 abstract description 19
- 238000005516 engineering process Methods 0.000 abstract description 17
- 238000011084 recovery Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 7
- 238000013500 data storage Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- VQLYBLABXAHUDN-UHFFFAOYSA-N bis(4-fluorophenyl)-methyl-(1,2,4-triazol-1-ylmethyl)silane;methyl n-(1h-benzimidazol-2-yl)carbamate Chemical compound C1=CC=C2NC(NC(=O)OC)=NC2=C1.C=1C=C(F)C=CC=1[Si](C=1C=CC(F)=CC=1)(C)CN1C=NC=N1 VQLYBLABXAHUDN-UHFFFAOYSA-N 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Landscapes
- Hardware Redundancy (AREA)
Abstract
The invention discloses and a kind of realize the method for cluster-based storage high availability, node apparatus and system, the method includes: trigger adapter event according to the node status information received;Obtain the volume information of malfunctioning node, storing device information and SAN storage information and generate locally configured information;Storage device in SAN cluster described in carry also starts distributed file system service routine, and the roll recovery in described storage device uses.During the present invention enables to the resource switch between node, the not only IP of taking over fault node and service process resource, and the storage software service process of taking over fault node and physical memory resources, support NFS/CIFS/HTTP/FTP/ISCSI agreement and PanaFS agreement, the connection utilizing ICP/IP protocol reconnects technology, achieve the transparent adapter of malfunctioning node, the service disconnection during adapter will not be produced.
Description
Technical Field
The present invention relates to the technical field of network storage systems, and in particular, to a method, a node apparatus, and a system for implementing high availability of cluster storage.
Background
In the background of cloud storage and big data, the data shows an explosive growth trend. According to the research, the digital universe reaches 35.2ZB in 2020, which is a 44-fold leap over 0.8ZB in 2009, and more than 80% of the data is unstructured data. The data blowout caused by the intensive application of a large amount of data such as high-performance calculation, medical imaging, oil and gas exploration, digital media, social WEB and the like continuously provides a new and serious challenge for a storage method. Cluster storage is a Scale-out (Scale-out) storage architecture, has the advantages of linear expansion of capacity and performance, and has been widely accepted by global markets. Besides the characteristics of high performance and high expansion, the cluster storage also has the characteristic of high availability, which is particularly critical to an enterprise core service system and ensures the continuity of key services.
The prior art solution of cluster storage mainly solves the availability problem through a redundancy technology, including a duplication technology, an erasure code technology, and a primary/standby or full availability (HA) technology. The copy technology can effectively improve the data availability by adding different numbers of copies, but the storage utilization rate is low (one times of the number of copies), and the complexity of data management is increased. The erasure code improves the storage availability through redundant coding, has lower space complexity and data redundancy, and is high in storage utilization rate, but the coding mode is complex, needs a large amount of calculation and reduces the service performance, and is suitable for the situation that the number of cluster nodes is large. The Active/Standby HA technology also adopts a redundancy technology to obtain high availability, but the waste of storage resources is serious. The Active HA technology enables the whole system to continuously and uninterruptedly provide services to the outside by monitoring and switching the fault node resources (IP, service process, service data, etc.) to the normal nodes. The HA technology can improve the usability, HAs a load balancing function and is high in resource utilization rate. The main problem with HA technology is that during resource switching, which can lead to service interruption, usually only IP and service process resources are taken over, while traffic data or physical storage resources need to be managed by external systems.
Disclosure of Invention
The invention aims to provide a method, a node device and a system for realizing high availability of cluster storage, which can take over not only IP and service process resources of a fault node but also storage software service processes and physical storage resources of the fault node during resource switching between nodes.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method of implementing cluster storage high availability, the method comprising:
triggering a take-over event according to the received node state information;
acquiring volume information, storage equipment information and Storage Area Network (SAN) storage information of a fault node and generating local configuration information;
and mounting the storage device in the SAN cluster and starting a distributed file system service routine, wherein the volume on the storage device is recovered to be used.
Before triggering the takeover event according to the received node state information, the method further includes:
and responding to the backup request of the fault node, and receiving the state information of the fault node.
Before the failed node sends out the backup request, the method further includes:
receiving monitoring information through a regularly triggered monitoring event, and judging the state of local service;
when the service state is abnormal, determining a backup node in the SAN cluster according to a round robin scheduling (round robin scheduling) algorithm;
and sending the local state information to other nodes and sending a backup request to the backup node.
The method further comprises the following steps:
and sending a message to other nodes in the SAN cluster to update the volume information and the storage device information.
If the failed node is repaired, the method further comprises:
taking over the taken over resources of the backup node to the local part again;
sending a message to other nodes in the SAN cluster, and updating the volume information and the storage device information;
and sending a resource releasing message to the backup node.
When the node takes over the resource, the method further comprises the following steps:
acquiring connection information before taking over, constructing an Acknowledgement Character (ACK) request message with an automatically increased number sequence of zero (sequence = 0), and sending the ACK request message to a previously connected client;
receiving an ACK response message of the client, wherein sequence = N in the ACK response message;
sending a reset connection (RST) request to the client, and informing the client to reestablish the TCP connection;
and establishing a TCP connection with the client with the transmission port reestablished.
A node device for realizing high availability of cluster storage comprises: the system comprises a timer module, an event processing module, a monitoring module and a communication module; wherein,
the timer module is used for triggering a monitoring event to the event processing module and monitoring the resources, services and the like of the node at regular time;
the monitoring module is used for monitoring the state of the specified service and returning the state information of the service to the event processing module;
the communication module is used for information transmission and data synchronization among all node devices;
and the event processing module is used for receiving the return information of each module in the device and carrying out the next operation scheduling according to the return information.
Further comprising: a take-over module and a release module; wherein,
the takeover module is used for receiving various resources of the failed node when the node device is used as a backup node, wherein the resources comprise SAN storage equipment, a volume and corresponding distributed file system services;
and the releasing module is used for releasing each resource taken over by the taking-over module after the failed node replies.
And after the node device is used as a backup node to take over the resources of the fault node, the TCP connection is reestablished with the client connected with the fault node.
A system for realizing high availability of cluster storage comprises at least two node devices and storage equipment in SAN.
By adopting the technical scheme of the invention, the IP and service process resources of the fault node can be taken over during the resource switching period between the nodes, the storage software service process and the physical storage resource of the fault node can be taken over, the NFS/CIFS/HTTP/FTP/ISCSI protocol and the PanaFS protocol are supported, the transparent taking over of the fault node is realized by utilizing the connection reconnection technology of the TCP/IP protocol, and the service interruption during the taking over period can not be generated.
Drawings
Fig. 1 is a schematic structural diagram of a node apparatus for implementing high availability of cluster storage according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a node apparatus for implementing high availability of cluster storage as a standby node according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a system for implementing high availability of cluster storage according to an embodiment of the present invention.
Fig. 4 is a flowchart of a method for implementing high availability of cluster storage according to an embodiment of the present invention.
Fig. 5 is a flowchart of a backup request issued by a failed node in an embodiment of the present invention.
Fig. 6 is a schematic diagram of an information interaction process in the system after the failed node is repaired in the embodiment of the present invention.
Fig. 7 is a schematic process diagram of a node taking over service resources and a client reestablishing a TCP connection in an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.
Fig. 1 is a schematic structural diagram of a node apparatus for implementing high availability of cluster storage according to an embodiment of the present invention. The node apparatus includes: the system comprises a timer module, an event processing module, a monitoring module and a communication module; wherein,
and the timer module is used for triggering a monitoring event to the event processing module and monitoring the resources, services and the like of the node at regular time.
The HA system triggers events through a timer, monitors resources, services and the like of the node at regular time, synchronizes information with other nodes if the state changes, and triggers corresponding events to process. The timer module is used for setting time interval trigger defined events, including periodically triggering monitoring events, periodically triggering keepalive events and the like.
And the monitoring module is used for monitoring the state of the specified service and returning the state information of the service to the event processing module.
The monitoring module monitors the state of the specified service periodically and returns the state information of the service to the event processing module in a return value mode. If the service is normal, 0 is returned, otherwise, a non-0 value is returned.
The communication module is used for information transmission and data synchronization among all node devices; including periodic heartbeat and data synchronization.
And the event processing module is used for receiving the return information of each module in the device and carrying out the next operation scheduling according to the return information.
When the node shown in fig. 1 is used as a backup node, the node apparatus further includes a take-over module and a release module, as shown in fig. 2, wherein,
and the takeover module is used for receiving various resources of the failed node when the node device is used as a backup node, wherein the resources comprise SAN storage equipment, a volume and corresponding distributed file system services.
The takeover module is responsible for receiving various resources of the failed node, and mainly comprises SAN storage and corresponding distributed file system service. The takeover module firstly acquires SAN storage information, storage device information and volume information managed by a fault node, then generates complete configuration information on the node, mounts the storage devices in the SAN cluster and starts distributed file system service, and the volumes established on the storage devices can be recovered for use.
The distributed file system service can adopt distributed storage systems such as Lustre, Panasas, Ceph, pNFS or PanaFS. The embodiment of the invention preferably selects the PanaFS distributed file system, uses a private PanaFS protocol for data access, is the basis of access protocols such as CIFS/NFS/FTP/HTTP and the like, and provides a uniform shared storage space. Panafs is a protocol compatible with the POSIX (portable operating System interface) standard, and has the characteristics of high expansion, high availability, high performance, high efficiency and the like
And the releasing module is used for releasing each resource taken over by the taking-over module after the failed node replies.
The event processing module of the node device is similar to a scheduler and is responsible for receiving the return information of different modules, and taking the return information as a basis, the next operation is carried out, and the method mainly comprises the following steps:
1. and receiving the message of the timer and calling the monitoring module.
2. And receiving the message of the monitoring module. If the return value of the monitoring module is 0, the service is normal. Comparing with the previous monitoring result, if the previous state is abnormal, it indicates that the current node has been repaired now, and a process of recovering resources needs to be executed, as shown in fig. 3. If a value other than 0 is returned, the service of the node is abnormal, and the following steps are executed:
(1) in the node, a round-robin scheduling (round robin scheduling) algorithm is adopted, and one node is selected as a backup node;
(2) calling a communication module to send the state information of the node to other nodes;
(3) and sending a message to the backup node. And the backup node executes the takeover module after receiving the message.
3. And receiving messages sent by other nodes. And if the node is selected as the backup node, operating the takeover module. And if the information of the repair node is received, operating a release module.
Fig. 3 is a schematic structural diagram of a system for implementing high availability of cluster storage according to an embodiment of the present invention.
In a cluster storage system based on a SAN architecture, a middle-high end disk array subsystem is adopted for back end storage, RAID (redundant array of independent disks) grades of different levels such as 0, 1, 5, 6 and 10 are supported, and the system is connected to each cluster node through an optical fiber FC interface. The SAN disk array protects data through different RAID levels, provides high availability through a redundancy mechanism, and reduces storage utilization rate to a certain degree. Under the structure, if the high availability of the cluster service is provided by adopting the copy or the erasure code, the storage utilization rate is further reduced or the performance of the cluster system is greatly reduced. When a cluster node server fails, the back-end SAN storage is still normally in working order, and the data stored thereon is also complete and consistent. Therefore, one node can be completely selected from other normally working cluster nodes to take over the resources and services of the fault node, and the data service is continuously provided to the outside, so that the continuity of the service is ensured. The HA method of the patent of the invention is different from the prior HA technical scheme in that the HA method is oriented to a cluster storage system based on SAN architecture, and adopts the full-activity HA architecture technology, not only takes over the IP and service process resources of a fault node, but also takes over the storage software service process and physical storage resources of the fault node, and supports NFS/CIFS/HTTP/FTP/ISCSI protocol and PanaFS protocol. By utilizing the connection reconnection technology of the TCP/IP protocol, the transparent takeover of the fault node is realized, and the service interruption during the takeover period can not be generated. The method ensures that the storage utilization rate and the system performance of the cluster storage system are not influenced, can transparently take over complete system resources and provide higher system availability, and has the following main design principles:
1. when a certain node is down or the system is abnormal and can not provide data storage service to the upper-layer application any more, the backup node needs to take over the SAN storage device connected to the node and start the corresponding service, so as to ensure that the front-end application can still normally perform data storage operation.
2. In order to balance the load of each node in the system and avoid the overload of the backup node, when the failed node is repaired, the taken-over SAN storage needs to be restored again.
3. In the above-mentioned takeover and recovery processes, it needs to be ensured that there is no obvious influence on the data storage of the front end, the service of the distributed file system is not interrupted, and transparent takeover and recovery are achieved.
4. And selecting the backup nodes, namely selecting one of the nodes which normally work at present as the backup node by adopting a Round-Robin polling method.
When a node in the cluster fails, the HA system selects one of the nodes in the OK state by using a polling algorithm to take over the IP address of the failed node. Therefore, a takeip event can be triggered on the selected node, the node with the takeip event can be used as a backup node, and fault takeover operation is executed. After the fault node is recovered, the IP drifted before can drift back again, and meanwhile, a takeip event is triggered and fault recovery operation is executed. The existing HA technical solution often only processes the availability of various service services on the cluster node, such as CIFS, NFS, HTTP, FTP, and other services, but does not process the TCP/IP connection or Session (Session) that HAs been established with the node before the failure, which may cause service interruption during the takeover period and may not realize transparent takeover. In the method, a TCP spoofing technology is used, and the takeover node actively reestablishes the connection with the client which has established the connection, so that transparent takeover is realized. The procedure for reestablishing the connection is as follows:
1. the new node (the takeover node) acquires the previous connection information from the shared storage, constructs an ACK request message with an automatically increased number sequence of zero (sequence = 0), and sends the ACK request message to the client;
2. after receiving the request, the client sends an ACK response and corrects sequence = N;
3. after the new node obtains the correct sequence, sending an RST request and informing the client to reestablish the TCP connection;
4. and the client reestablishes the transmission port, and the TCP connection is reestablished.
Fig. 4 is a flowchart of a method for implementing high availability of cluster storage according to an embodiment of the present invention, where the method includes:
s401, triggering a takeover event according to the received node state information. The backup node receives the state information of the fault node through the communication module, sends the state information to the event processing module and triggers a takeover event.
S402, acquiring the volume information, the storage device information and the SAN storage information of the fault node and generating local configuration information. And generating local configuration according to the volume information, the storage equipment information and the SAN information used by the fault node for mounting corresponding storage resources when taking over.
S403, mounting the storage device in the SAN cluster and starting a panaFS service routine, wherein the volume on the storage device is recovered to be used.
And in the step S401, before the backup node triggers a takeover event according to the received node state information, responding to a backup request of the fault node and receiving the state information of the fault node.
Before the failed node sends out the backup request, as shown in fig. 5, the method further includes the following steps:
s501, receiving monitoring information through a regularly triggered monitoring event, and judging the state of local service;
s502, when the service state is abnormal, a backup node is determined in the SAN cluster according to a round robin scheduling (RoundRobinScheduling) algorithm;
s503, the fault node sends the local state information to other nodes and sends a backup request to the backup node.
And after the backup node takes over the storage resources successfully, the backup node sends a message to other nodes in the SAN cluster to update the volume information and the storage equipment information.
When the failed node is repaired, as shown in fig. 6, the failed node takes over the taken-over resources of the backup node to the local again; sending a message to other nodes in the SAN cluster, and updating the volume information and the storage device information; and sending a resource releasing message to the backup node. And after receiving the resource releasing message, the backup node releases the resources taken over previously.
No matter the backup node takes over the storage resource of the failed node, or the failed node recovers and then takes over the original storage resource from the backup node, the TCP connection needs to be reestablished with the existing service connection client, and the process of reestablishing the TCP connection is shown in fig. 7, and includes:
1. acquiring connection information before taking over, constructing an ACK request message with an automatically increased number sequence of zero (sequence = 0), and sending the ACK request message to a previously connected client;
2. receiving an ACK response message of the client, wherein sequence = N in the ACK response message;
3. sending an RST request to the client, and informing the client to reestablish the TCP connection;
4. and establishing a TCP connection with the client with the transmission port reestablished.
In the method, a TCP spoofing technology is used, and the takeover node actively reestablishes the connection with the client which has established the connection, so that transparent takeover is realized, and the service cannot be interrupted. And realizing transparent take-over, modifying a communication protocol of the cluster storage system, and reconstructing a communication flow of the server and the client software module. The transparent fault takeover is very critical to a key service system, the client only presents transient blocking, connection interruption or abnormal exit and other phenomena cannot be caused, data consistency and service continuity can be guaranteed, and higher cluster availability is achieved.
By adopting the technical scheme of the invention, the IP and service process resources of the fault node can be taken over during the resource switching period between the nodes, the storage software service process and the physical storage resource of the fault node can be taken over, the NFS/CIFS/HTTP/FTP/ISCSI protocol and the PanaFS protocol are supported, the transparent taking over of the fault node is realized by utilizing the connection reconnection technology of the TCP/IP protocol, and the service interruption during the taking over period can not be generated.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (3)
1. A method for realizing high availability of cluster storage is characterized in that the method comprises the following steps:
receiving monitoring information through a regularly triggered monitoring event, and judging a local service state;
when the service state is abnormal, determining a backup node in an SAN (storage area network) cluster according to a round robin scheduling (round robin scheduling) algorithm;
sending local service state information to other nodes and sending backup requests to the backup nodes;
responding to a backup request of a fault node, and receiving service state information of the fault node;
triggering a takeover event according to the received service state information of the fault node;
acquiring volume information, storage device information and SAN storage information of a fault node and generating local configuration information;
mounting a storage device in the SAN cluster and starting a distributed file system service routine, wherein a volume on the storage device is recovered to be used;
wherein if the failed node has been repaired, the method further comprises:
taking over the taken over resources of the backup node to the local part again;
sending a message to other nodes in the SAN cluster, and updating the volume information and the storage device information;
sending a resource releasing message to the backup node;
wherein, when the node takes over the resource, the method further comprises:
acquiring connection information before taking over, constructing an Acknowledgement Character (ACK) request message with an automatically increased number sequence of zero (sequence is 0), and sending the ACK request message to a previously connected client;
receiving an ACK response message of the client, wherein a sequence in the ACK response message is N;
sending a reset connection (RST) request to the client, and informing the client to reestablish the TCP connection;
and the client connected with the fault node reestablishes the TCP connection.
2. A node apparatus for realizing high availability of cluster storage, comprising: the system comprises a timer module, an event processing module, a monitoring module and a communication module; wherein,
the timer module is used for triggering a monitoring event to the event processing module and monitoring the resources and services of the node at regular time;
the monitoring module is used for monitoring the specified service state and returning service state information to the event processing module;
the communication module is used for information transmission and data synchronization among all node devices;
the event processing module is used for determining a backup node in the SAN cluster according to a round-robin scheduling algorithm when the service state is abnormal, sending local state information to other nodes, sending a backup request to the backup node, responding to the backup request of the fault node, and receiving the service state information of the fault node;
wherein, the node apparatus further comprises: a take-over module and a release module; wherein,
a takeover module, configured to take over various resources of a failed node when the node apparatus is used as a backup node, where the resources include an SAN storage device, a volume, and a corresponding distributed file system service, generate local configuration information, mount a storage device in the SAN cluster, and start a distributed file system service routine, where a volume on the storage device is recovered for use;
the releasing module is used for releasing each resource taken over by the taking-over module after the failed node is recovered;
after the node device is used as a backup node to take over the resources of a fault node, acquiring connection information before taking over, constructing an Acknowledgement Character (ACK) request message with an automatically increased number sequence of zero (sequence 0), and sending the ACK request message to a previously connected client; receiving an ACK response message of the client, wherein a sequence in the ACK response message is N; and sending a reset connection (RST) request to the client, and reestablishing the TCP connection with the client connected with the failed node.
3. A system for implementing high availability of cluster storage, comprising the node apparatus of claim 2 and a storage device in a SAN cluster.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210363576.5A CN102868754B (en) | 2012-09-26 | 2012-09-26 | A kind of realize the method for cluster-based storage high availability, node apparatus and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210363576.5A CN102868754B (en) | 2012-09-26 | 2012-09-26 | A kind of realize the method for cluster-based storage high availability, node apparatus and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102868754A CN102868754A (en) | 2013-01-09 |
CN102868754B true CN102868754B (en) | 2016-08-03 |
Family
ID=47447340
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210363576.5A Active CN102868754B (en) | 2012-09-26 | 2012-09-26 | A kind of realize the method for cluster-based storage high availability, node apparatus and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102868754B (en) |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103501338B (en) * | 2013-09-30 | 2017-04-05 | 华为技术有限公司 | A kind of lock restoration methods, equipment and NFS |
CN103605616A (en) * | 2013-11-21 | 2014-02-26 | 浪潮电子信息产业股份有限公司 | Multi-controller cache data consistency guarantee method |
CN103607311B (en) * | 2013-11-29 | 2017-01-18 | 厦门市美亚柏科信息股份有限公司 | System and method for reestablishing TCP connection seamlessly |
CN103701906B (en) * | 2013-12-27 | 2017-06-09 | 北京奇安信科技有限公司 | Distributed real time computation system and its data processing method |
CN105208069B (en) * | 2014-06-27 | 2018-05-22 | 中国科学院上海生命科学研究院 | Brain function network data cloud system |
CN105430026A (en) * | 2014-09-04 | 2016-03-23 | 中国石油化工股份有限公司 | Cloud storage data synchronization method based on a plurality of control strategies |
CN105471945A (en) * | 2014-09-04 | 2016-04-06 | 中国石油化工股份有限公司 | Application method of cloud storage in seismic integrated interpretation |
CN104391654B (en) * | 2014-11-06 | 2018-02-06 | 浪潮(北京)电子信息产业有限公司 | A kind of shared disk management method and system |
CN105337762A (en) * | 2015-09-28 | 2016-02-17 | 浪潮(北京)电子信息产业有限公司 | File sharing method supporting automatic failover |
CN105592139B (en) * | 2015-10-28 | 2019-03-15 | 新华三技术有限公司 | A kind of the HA implementation method and device of distributed file system management platform |
CN105471622B (en) * | 2015-11-12 | 2019-03-01 | 武汉噢易云计算股份有限公司 | A kind of high availability method and system of the control node active-standby switch based on Galera |
CN105516252A (en) * | 2015-11-26 | 2016-04-20 | 华为技术有限公司 | TCP (Transmission Control Protocol) connection switching method, apparatus and system |
CN105337780B (en) * | 2015-12-01 | 2018-09-18 | 迈普通信技术股份有限公司 | A kind of server node configuration method and physical node |
CN107231246A (en) * | 2016-03-23 | 2017-10-03 | 北京佳讯飞鸿电气股份有限公司 | A kind of server access system and method |
CN105930103B (en) * | 2016-05-10 | 2019-04-16 | 南京大学 | A kind of correcting and eleting codes covering write method of distributed storage CEPH |
CN108268210B (en) * | 2016-12-30 | 2022-03-08 | 中移(苏州)软件技术有限公司 | Information processing method, computing node and storage node |
CN106951444A (en) * | 2017-02-17 | 2017-07-14 | 深圳市嘉力达节能科技股份有限公司 | Architectural engineering information processing method and device |
CN107454165A (en) * | 2017-08-04 | 2017-12-08 | 郑州云海信息技术有限公司 | Access method and device of a kind of hadoop cluster to ceph clusters |
CN107566182A (en) * | 2017-09-14 | 2018-01-09 | 郑州云海信息技术有限公司 | The adapting method and system of a kind of NFS |
CN109726600B (en) * | 2017-10-31 | 2023-07-14 | 伊姆西Ip控股有限责任公司 | System and method for providing data protection for super fusion infrastructure |
CN111209260A (en) * | 2019-12-30 | 2020-05-29 | 创新科技术有限公司 | NFS cluster based on distributed storage and method for providing NFS service |
CN111586138B (en) * | 2020-04-30 | 2022-10-21 | 中国工商银行股份有限公司 | Job processing method, device and system and electronic equipment |
CN111787113B (en) * | 2020-07-03 | 2021-09-03 | 北京大道云行科技有限公司 | Node fault processing method and device, storage medium and electronic equipment |
CN111949452B (en) * | 2020-09-18 | 2022-09-20 | 苏州浪潮智能科技有限公司 | Method and device for rapidly recovering IO (input/output) in single-node fault of storage system |
CN112104513B (en) * | 2020-11-02 | 2021-02-12 | 武汉中科通达高新技术股份有限公司 | Visual software load method, device, equipment and storage medium |
CN113472566A (en) * | 2021-06-11 | 2021-10-01 | 北京市大数据中心 | Status monitoring method of union block chain and master node status monitoring system |
CN115134219A (en) * | 2022-06-29 | 2022-09-30 | 北京飞讯数码科技有限公司 | Device resource management method and device, computing device and storage medium |
CN116155686B (en) * | 2023-01-30 | 2024-05-31 | 浪潮云信息技术股份公司 | Method for judging node faults in cloud environment |
CN118035199B (en) * | 2024-01-12 | 2024-11-19 | 湖南国科亿存信息科技有限公司 | NFS server control method and device for preventing abnormal reading and writing during high availability switching |
CN118069376B (en) * | 2024-04-18 | 2024-07-02 | 北京大道云行科技有限公司 | Multi-tenant high-availability system based on SAN storage |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101651559A (en) * | 2009-07-13 | 2010-02-17 | 浪潮电子信息产业股份有限公司 | Failover method of storage service in double controller storage system |
CN102340530A (en) * | 2010-07-26 | 2012-02-01 | 杭州信核数据科技有限公司 | Method and system for memory space take-over and data migration |
CN102413172A (en) * | 2011-10-31 | 2012-04-11 | 北京联创信安科技有限公司 | Parallel data sharing method based on cluster technology and apparatus thereof |
CN102571904A (en) * | 2011-10-11 | 2012-07-11 | 浪潮电子信息产业股份有限公司 | Construction method of NAS cluster system based on modularization design |
CN102664757A (en) * | 2012-04-25 | 2012-09-12 | 浙江宇视科技有限公司 | Cascading method and equipment for storage devices |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1254748C (en) * | 2003-10-31 | 2006-05-03 | 清华大学 | Method for accessing distributed and virtualized storage in local network |
CN101179432A (en) * | 2007-12-13 | 2008-05-14 | 浪潮电子信息产业股份有限公司 | A Method of Realizing System High Availability in Multi-machine Environment |
CN102655460B (en) * | 2012-01-05 | 2014-09-24 | 中国工商银行股份有限公司 | Redundancy backup method and system of production server |
-
2012
- 2012-09-26 CN CN201210363576.5A patent/CN102868754B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101651559A (en) * | 2009-07-13 | 2010-02-17 | 浪潮电子信息产业股份有限公司 | Failover method of storage service in double controller storage system |
CN102340530A (en) * | 2010-07-26 | 2012-02-01 | 杭州信核数据科技有限公司 | Method and system for memory space take-over and data migration |
CN102571904A (en) * | 2011-10-11 | 2012-07-11 | 浪潮电子信息产业股份有限公司 | Construction method of NAS cluster system based on modularization design |
CN102413172A (en) * | 2011-10-31 | 2012-04-11 | 北京联创信安科技有限公司 | Parallel data sharing method based on cluster technology and apparatus thereof |
CN102664757A (en) * | 2012-04-25 | 2012-09-12 | 浙江宇视科技有限公司 | Cascading method and equipment for storage devices |
Non-Patent Citations (1)
Title |
---|
《海量存储文件系统Castor FS的设计与实现》;陈志凌 等;《计算机工程与应用》;20060430(第4期);第108-110页 * |
Also Published As
Publication number | Publication date |
---|---|
CN102868754A (en) | 2013-01-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102868754B (en) | A kind of realize the method for cluster-based storage high availability, node apparatus and system | |
US11075795B2 (en) | Arbitration method, apparatus, and system used in active-active data centers | |
KR101513863B1 (en) | Method and system for network element service recovery | |
US7386610B1 (en) | Internet protocol data mirroring | |
EP1963985B1 (en) | System and method for enabling site failover in an application server environment | |
CN102932210B (en) | Method and system for monitoring node in PaaS cloud platform | |
EP3210367B1 (en) | System and method for disaster recovery of cloud applications | |
CN101741536B (en) | Data level disaster-tolerant method and system and production center node | |
EP1955506B1 (en) | Methods, systems, and computer program products for session initiation protocol (sip) fast switchover | |
WO2013043439A1 (en) | Storage area network attached clustered storage system | |
WO2016202051A1 (en) | Method and device for managing active and backup nodes in communication system and high-availability cluster | |
JP2014026321A (en) | Storage device, information processing device, information processing system, access control method, and access control program | |
JPWO2020044934A1 (en) | Communication equipment, methods, and programs | |
CN115801642B (en) | RDMA communication management module, method, equipment and medium based on state control | |
EP3167372B1 (en) | Methods for facilitating high availability storage services and corresponding devices | |
CN109474694A (en) | A management and control method and device for a NAS cluster based on a SAN storage array | |
JP2005301436A (en) | Cluster system and failure recovery method in cluster system | |
CN113326100B (en) | Cluster management method, device, equipment and computer storage medium | |
CN117201507A (en) | Cloud platform switching method and device, electronic equipment and storage medium | |
CN114301763A (en) | Distributed cluster fault processing method and system, electronic device and storage medium | |
CN102281159A (en) | Recovery method of cluster system | |
JP2013161266A (en) | Redundant control system for call processing information, and backup maintenance server for the same | |
CN114629778A (en) | A kind of IP multimedia service fault processing method, electronic device and storage medium | |
CN111414411A (en) | High availability database system | |
CN110890989A (en) | Channel connection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100085 No. 1, building 3, building ten, No. 8, 813 street, Beijing, Haidian District Applicant after: Beijing Lianchuang Xinan Technology Co., Ltd. Address before: 100085, room 712, building 7, block D, Jinyu Ka Wah building, No. 9, 3rd Street, Haidian District, Beijing Applicant before: Beijing Lianchuang Xinan Technology Co.,Ltd. |
|
COR | Change of bibliographic data | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |