[go: up one dir, main page]

CN102868754B - A kind of realize the method for cluster-based storage high availability, node apparatus and system - Google Patents

A kind of realize the method for cluster-based storage high availability, node apparatus and system Download PDF

Info

Publication number
CN102868754B
CN102868754B CN201210363576.5A CN201210363576A CN102868754B CN 102868754 B CN102868754 B CN 102868754B CN 201210363576 A CN201210363576 A CN 201210363576A CN 102868754 B CN102868754 B CN 102868754B
Authority
CN
China
Prior art keywords
node
information
storage
cluster
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210363576.5A
Other languages
Chinese (zh)
Other versions
CN102868754A (en
Inventor
刘爱贵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING LIANCHUANG XINAN TECHNOLOGY CO LTD
Original Assignee
BEIJING LIANCHUANG XINAN TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING LIANCHUANG XINAN TECHNOLOGY CO LTD filed Critical BEIJING LIANCHUANG XINAN TECHNOLOGY CO LTD
Priority to CN201210363576.5A priority Critical patent/CN102868754B/en
Publication of CN102868754A publication Critical patent/CN102868754A/en
Application granted granted Critical
Publication of CN102868754B publication Critical patent/CN102868754B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The invention discloses and a kind of realize the method for cluster-based storage high availability, node apparatus and system, the method includes: trigger adapter event according to the node status information received;Obtain the volume information of malfunctioning node, storing device information and SAN storage information and generate locally configured information;Storage device in SAN cluster described in carry also starts distributed file system service routine, and the roll recovery in described storage device uses.During the present invention enables to the resource switch between node, the not only IP of taking over fault node and service process resource, and the storage software service process of taking over fault node and physical memory resources, support NFS/CIFS/HTTP/FTP/ISCSI agreement and PanaFS agreement, the connection utilizing ICP/IP protocol reconnects technology, achieve the transparent adapter of malfunctioning node, the service disconnection during adapter will not be produced.

Description

Method, node device and system for realizing high availability of cluster storage
Technical Field
The present invention relates to the technical field of network storage systems, and in particular, to a method, a node apparatus, and a system for implementing high availability of cluster storage.
Background
In the background of cloud storage and big data, the data shows an explosive growth trend. According to the research, the digital universe reaches 35.2ZB in 2020, which is a 44-fold leap over 0.8ZB in 2009, and more than 80% of the data is unstructured data. The data blowout caused by the intensive application of a large amount of data such as high-performance calculation, medical imaging, oil and gas exploration, digital media, social WEB and the like continuously provides a new and serious challenge for a storage method. Cluster storage is a Scale-out (Scale-out) storage architecture, has the advantages of linear expansion of capacity and performance, and has been widely accepted by global markets. Besides the characteristics of high performance and high expansion, the cluster storage also has the characteristic of high availability, which is particularly critical to an enterprise core service system and ensures the continuity of key services.
The prior art solution of cluster storage mainly solves the availability problem through a redundancy technology, including a duplication technology, an erasure code technology, and a primary/standby or full availability (HA) technology. The copy technology can effectively improve the data availability by adding different numbers of copies, but the storage utilization rate is low (one times of the number of copies), and the complexity of data management is increased. The erasure code improves the storage availability through redundant coding, has lower space complexity and data redundancy, and is high in storage utilization rate, but the coding mode is complex, needs a large amount of calculation and reduces the service performance, and is suitable for the situation that the number of cluster nodes is large. The Active/Standby HA technology also adopts a redundancy technology to obtain high availability, but the waste of storage resources is serious. The Active HA technology enables the whole system to continuously and uninterruptedly provide services to the outside by monitoring and switching the fault node resources (IP, service process, service data, etc.) to the normal nodes. The HA technology can improve the usability, HAs a load balancing function and is high in resource utilization rate. The main problem with HA technology is that during resource switching, which can lead to service interruption, usually only IP and service process resources are taken over, while traffic data or physical storage resources need to be managed by external systems.
Disclosure of Invention
The invention aims to provide a method, a node device and a system for realizing high availability of cluster storage, which can take over not only IP and service process resources of a fault node but also storage software service processes and physical storage resources of the fault node during resource switching between nodes.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method of implementing cluster storage high availability, the method comprising:
triggering a take-over event according to the received node state information;
acquiring volume information, storage equipment information and Storage Area Network (SAN) storage information of a fault node and generating local configuration information;
and mounting the storage device in the SAN cluster and starting a distributed file system service routine, wherein the volume on the storage device is recovered to be used.
Before triggering the takeover event according to the received node state information, the method further includes:
and responding to the backup request of the fault node, and receiving the state information of the fault node.
Before the failed node sends out the backup request, the method further includes:
receiving monitoring information through a regularly triggered monitoring event, and judging the state of local service;
when the service state is abnormal, determining a backup node in the SAN cluster according to a round robin scheduling (round robin scheduling) algorithm;
and sending the local state information to other nodes and sending a backup request to the backup node.
The method further comprises the following steps:
and sending a message to other nodes in the SAN cluster to update the volume information and the storage device information.
If the failed node is repaired, the method further comprises:
taking over the taken over resources of the backup node to the local part again;
sending a message to other nodes in the SAN cluster, and updating the volume information and the storage device information;
and sending a resource releasing message to the backup node.
When the node takes over the resource, the method further comprises the following steps:
acquiring connection information before taking over, constructing an Acknowledgement Character (ACK) request message with an automatically increased number sequence of zero (sequence = 0), and sending the ACK request message to a previously connected client;
receiving an ACK response message of the client, wherein sequence = N in the ACK response message;
sending a reset connection (RST) request to the client, and informing the client to reestablish the TCP connection;
and establishing a TCP connection with the client with the transmission port reestablished.
A node device for realizing high availability of cluster storage comprises: the system comprises a timer module, an event processing module, a monitoring module and a communication module; wherein,
the timer module is used for triggering a monitoring event to the event processing module and monitoring the resources, services and the like of the node at regular time;
the monitoring module is used for monitoring the state of the specified service and returning the state information of the service to the event processing module;
the communication module is used for information transmission and data synchronization among all node devices;
and the event processing module is used for receiving the return information of each module in the device and carrying out the next operation scheduling according to the return information.
Further comprising: a take-over module and a release module; wherein,
the takeover module is used for receiving various resources of the failed node when the node device is used as a backup node, wherein the resources comprise SAN storage equipment, a volume and corresponding distributed file system services;
and the releasing module is used for releasing each resource taken over by the taking-over module after the failed node replies.
And after the node device is used as a backup node to take over the resources of the fault node, the TCP connection is reestablished with the client connected with the fault node.
A system for realizing high availability of cluster storage comprises at least two node devices and storage equipment in SAN.
By adopting the technical scheme of the invention, the IP and service process resources of the fault node can be taken over during the resource switching period between the nodes, the storage software service process and the physical storage resource of the fault node can be taken over, the NFS/CIFS/HTTP/FTP/ISCSI protocol and the PanaFS protocol are supported, the transparent taking over of the fault node is realized by utilizing the connection reconnection technology of the TCP/IP protocol, and the service interruption during the taking over period can not be generated.
Drawings
Fig. 1 is a schematic structural diagram of a node apparatus for implementing high availability of cluster storage according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a node apparatus for implementing high availability of cluster storage as a standby node according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a system for implementing high availability of cluster storage according to an embodiment of the present invention.
Fig. 4 is a flowchart of a method for implementing high availability of cluster storage according to an embodiment of the present invention.
Fig. 5 is a flowchart of a backup request issued by a failed node in an embodiment of the present invention.
Fig. 6 is a schematic diagram of an information interaction process in the system after the failed node is repaired in the embodiment of the present invention.
Fig. 7 is a schematic process diagram of a node taking over service resources and a client reestablishing a TCP connection in an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.
Fig. 1 is a schematic structural diagram of a node apparatus for implementing high availability of cluster storage according to an embodiment of the present invention. The node apparatus includes: the system comprises a timer module, an event processing module, a monitoring module and a communication module; wherein,
and the timer module is used for triggering a monitoring event to the event processing module and monitoring the resources, services and the like of the node at regular time.
The HA system triggers events through a timer, monitors resources, services and the like of the node at regular time, synchronizes information with other nodes if the state changes, and triggers corresponding events to process. The timer module is used for setting time interval trigger defined events, including periodically triggering monitoring events, periodically triggering keepalive events and the like.
And the monitoring module is used for monitoring the state of the specified service and returning the state information of the service to the event processing module.
The monitoring module monitors the state of the specified service periodically and returns the state information of the service to the event processing module in a return value mode. If the service is normal, 0 is returned, otherwise, a non-0 value is returned.
The communication module is used for information transmission and data synchronization among all node devices; including periodic heartbeat and data synchronization.
And the event processing module is used for receiving the return information of each module in the device and carrying out the next operation scheduling according to the return information.
When the node shown in fig. 1 is used as a backup node, the node apparatus further includes a take-over module and a release module, as shown in fig. 2, wherein,
and the takeover module is used for receiving various resources of the failed node when the node device is used as a backup node, wherein the resources comprise SAN storage equipment, a volume and corresponding distributed file system services.
The takeover module is responsible for receiving various resources of the failed node, and mainly comprises SAN storage and corresponding distributed file system service. The takeover module firstly acquires SAN storage information, storage device information and volume information managed by a fault node, then generates complete configuration information on the node, mounts the storage devices in the SAN cluster and starts distributed file system service, and the volumes established on the storage devices can be recovered for use.
The distributed file system service can adopt distributed storage systems such as Lustre, Panasas, Ceph, pNFS or PanaFS. The embodiment of the invention preferably selects the PanaFS distributed file system, uses a private PanaFS protocol for data access, is the basis of access protocols such as CIFS/NFS/FTP/HTTP and the like, and provides a uniform shared storage space. Panafs is a protocol compatible with the POSIX (portable operating System interface) standard, and has the characteristics of high expansion, high availability, high performance, high efficiency and the like
And the releasing module is used for releasing each resource taken over by the taking-over module after the failed node replies.
The event processing module of the node device is similar to a scheduler and is responsible for receiving the return information of different modules, and taking the return information as a basis, the next operation is carried out, and the method mainly comprises the following steps:
1. and receiving the message of the timer and calling the monitoring module.
2. And receiving the message of the monitoring module. If the return value of the monitoring module is 0, the service is normal. Comparing with the previous monitoring result, if the previous state is abnormal, it indicates that the current node has been repaired now, and a process of recovering resources needs to be executed, as shown in fig. 3. If a value other than 0 is returned, the service of the node is abnormal, and the following steps are executed:
(1) in the node, a round-robin scheduling (round robin scheduling) algorithm is adopted, and one node is selected as a backup node;
(2) calling a communication module to send the state information of the node to other nodes;
(3) and sending a message to the backup node. And the backup node executes the takeover module after receiving the message.
3. And receiving messages sent by other nodes. And if the node is selected as the backup node, operating the takeover module. And if the information of the repair node is received, operating a release module.
Fig. 3 is a schematic structural diagram of a system for implementing high availability of cluster storage according to an embodiment of the present invention.
In a cluster storage system based on a SAN architecture, a middle-high end disk array subsystem is adopted for back end storage, RAID (redundant array of independent disks) grades of different levels such as 0, 1, 5, 6 and 10 are supported, and the system is connected to each cluster node through an optical fiber FC interface. The SAN disk array protects data through different RAID levels, provides high availability through a redundancy mechanism, and reduces storage utilization rate to a certain degree. Under the structure, if the high availability of the cluster service is provided by adopting the copy or the erasure code, the storage utilization rate is further reduced or the performance of the cluster system is greatly reduced. When a cluster node server fails, the back-end SAN storage is still normally in working order, and the data stored thereon is also complete and consistent. Therefore, one node can be completely selected from other normally working cluster nodes to take over the resources and services of the fault node, and the data service is continuously provided to the outside, so that the continuity of the service is ensured. The HA method of the patent of the invention is different from the prior HA technical scheme in that the HA method is oriented to a cluster storage system based on SAN architecture, and adopts the full-activity HA architecture technology, not only takes over the IP and service process resources of a fault node, but also takes over the storage software service process and physical storage resources of the fault node, and supports NFS/CIFS/HTTP/FTP/ISCSI protocol and PanaFS protocol. By utilizing the connection reconnection technology of the TCP/IP protocol, the transparent takeover of the fault node is realized, and the service interruption during the takeover period can not be generated. The method ensures that the storage utilization rate and the system performance of the cluster storage system are not influenced, can transparently take over complete system resources and provide higher system availability, and has the following main design principles:
1. when a certain node is down or the system is abnormal and can not provide data storage service to the upper-layer application any more, the backup node needs to take over the SAN storage device connected to the node and start the corresponding service, so as to ensure that the front-end application can still normally perform data storage operation.
2. In order to balance the load of each node in the system and avoid the overload of the backup node, when the failed node is repaired, the taken-over SAN storage needs to be restored again.
3. In the above-mentioned takeover and recovery processes, it needs to be ensured that there is no obvious influence on the data storage of the front end, the service of the distributed file system is not interrupted, and transparent takeover and recovery are achieved.
4. And selecting the backup nodes, namely selecting one of the nodes which normally work at present as the backup node by adopting a Round-Robin polling method.
When a node in the cluster fails, the HA system selects one of the nodes in the OK state by using a polling algorithm to take over the IP address of the failed node. Therefore, a takeip event can be triggered on the selected node, the node with the takeip event can be used as a backup node, and fault takeover operation is executed. After the fault node is recovered, the IP drifted before can drift back again, and meanwhile, a takeip event is triggered and fault recovery operation is executed. The existing HA technical solution often only processes the availability of various service services on the cluster node, such as CIFS, NFS, HTTP, FTP, and other services, but does not process the TCP/IP connection or Session (Session) that HAs been established with the node before the failure, which may cause service interruption during the takeover period and may not realize transparent takeover. In the method, a TCP spoofing technology is used, and the takeover node actively reestablishes the connection with the client which has established the connection, so that transparent takeover is realized. The procedure for reestablishing the connection is as follows:
1. the new node (the takeover node) acquires the previous connection information from the shared storage, constructs an ACK request message with an automatically increased number sequence of zero (sequence = 0), and sends the ACK request message to the client;
2. after receiving the request, the client sends an ACK response and corrects sequence = N;
3. after the new node obtains the correct sequence, sending an RST request and informing the client to reestablish the TCP connection;
4. and the client reestablishes the transmission port, and the TCP connection is reestablished.
Fig. 4 is a flowchart of a method for implementing high availability of cluster storage according to an embodiment of the present invention, where the method includes:
s401, triggering a takeover event according to the received node state information. The backup node receives the state information of the fault node through the communication module, sends the state information to the event processing module and triggers a takeover event.
S402, acquiring the volume information, the storage device information and the SAN storage information of the fault node and generating local configuration information. And generating local configuration according to the volume information, the storage equipment information and the SAN information used by the fault node for mounting corresponding storage resources when taking over.
S403, mounting the storage device in the SAN cluster and starting a panaFS service routine, wherein the volume on the storage device is recovered to be used.
And in the step S401, before the backup node triggers a takeover event according to the received node state information, responding to a backup request of the fault node and receiving the state information of the fault node.
Before the failed node sends out the backup request, as shown in fig. 5, the method further includes the following steps:
s501, receiving monitoring information through a regularly triggered monitoring event, and judging the state of local service;
s502, when the service state is abnormal, a backup node is determined in the SAN cluster according to a round robin scheduling (RoundRobinScheduling) algorithm;
s503, the fault node sends the local state information to other nodes and sends a backup request to the backup node.
And after the backup node takes over the storage resources successfully, the backup node sends a message to other nodes in the SAN cluster to update the volume information and the storage equipment information.
When the failed node is repaired, as shown in fig. 6, the failed node takes over the taken-over resources of the backup node to the local again; sending a message to other nodes in the SAN cluster, and updating the volume information and the storage device information; and sending a resource releasing message to the backup node. And after receiving the resource releasing message, the backup node releases the resources taken over previously.
No matter the backup node takes over the storage resource of the failed node, or the failed node recovers and then takes over the original storage resource from the backup node, the TCP connection needs to be reestablished with the existing service connection client, and the process of reestablishing the TCP connection is shown in fig. 7, and includes:
1. acquiring connection information before taking over, constructing an ACK request message with an automatically increased number sequence of zero (sequence = 0), and sending the ACK request message to a previously connected client;
2. receiving an ACK response message of the client, wherein sequence = N in the ACK response message;
3. sending an RST request to the client, and informing the client to reestablish the TCP connection;
4. and establishing a TCP connection with the client with the transmission port reestablished.
In the method, a TCP spoofing technology is used, and the takeover node actively reestablishes the connection with the client which has established the connection, so that transparent takeover is realized, and the service cannot be interrupted. And realizing transparent take-over, modifying a communication protocol of the cluster storage system, and reconstructing a communication flow of the server and the client software module. The transparent fault takeover is very critical to a key service system, the client only presents transient blocking, connection interruption or abnormal exit and other phenomena cannot be caused, data consistency and service continuity can be guaranteed, and higher cluster availability is achieved.
By adopting the technical scheme of the invention, the IP and service process resources of the fault node can be taken over during the resource switching period between the nodes, the storage software service process and the physical storage resource of the fault node can be taken over, the NFS/CIFS/HTTP/FTP/ISCSI protocol and the PanaFS protocol are supported, the transparent taking over of the fault node is realized by utilizing the connection reconnection technology of the TCP/IP protocol, and the service interruption during the taking over period can not be generated.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (3)

1. A method for realizing high availability of cluster storage is characterized in that the method comprises the following steps:
receiving monitoring information through a regularly triggered monitoring event, and judging a local service state;
when the service state is abnormal, determining a backup node in an SAN (storage area network) cluster according to a round robin scheduling (round robin scheduling) algorithm;
sending local service state information to other nodes and sending backup requests to the backup nodes;
responding to a backup request of a fault node, and receiving service state information of the fault node;
triggering a takeover event according to the received service state information of the fault node;
acquiring volume information, storage device information and SAN storage information of a fault node and generating local configuration information;
mounting a storage device in the SAN cluster and starting a distributed file system service routine, wherein a volume on the storage device is recovered to be used;
wherein if the failed node has been repaired, the method further comprises:
taking over the taken over resources of the backup node to the local part again;
sending a message to other nodes in the SAN cluster, and updating the volume information and the storage device information;
sending a resource releasing message to the backup node;
wherein, when the node takes over the resource, the method further comprises:
acquiring connection information before taking over, constructing an Acknowledgement Character (ACK) request message with an automatically increased number sequence of zero (sequence is 0), and sending the ACK request message to a previously connected client;
receiving an ACK response message of the client, wherein a sequence in the ACK response message is N;
sending a reset connection (RST) request to the client, and informing the client to reestablish the TCP connection;
and the client connected with the fault node reestablishes the TCP connection.
2. A node apparatus for realizing high availability of cluster storage, comprising: the system comprises a timer module, an event processing module, a monitoring module and a communication module; wherein,
the timer module is used for triggering a monitoring event to the event processing module and monitoring the resources and services of the node at regular time;
the monitoring module is used for monitoring the specified service state and returning service state information to the event processing module;
the communication module is used for information transmission and data synchronization among all node devices;
the event processing module is used for determining a backup node in the SAN cluster according to a round-robin scheduling algorithm when the service state is abnormal, sending local state information to other nodes, sending a backup request to the backup node, responding to the backup request of the fault node, and receiving the service state information of the fault node;
wherein, the node apparatus further comprises: a take-over module and a release module; wherein,
a takeover module, configured to take over various resources of a failed node when the node apparatus is used as a backup node, where the resources include an SAN storage device, a volume, and a corresponding distributed file system service, generate local configuration information, mount a storage device in the SAN cluster, and start a distributed file system service routine, where a volume on the storage device is recovered for use;
the releasing module is used for releasing each resource taken over by the taking-over module after the failed node is recovered;
after the node device is used as a backup node to take over the resources of a fault node, acquiring connection information before taking over, constructing an Acknowledgement Character (ACK) request message with an automatically increased number sequence of zero (sequence 0), and sending the ACK request message to a previously connected client; receiving an ACK response message of the client, wherein a sequence in the ACK response message is N; and sending a reset connection (RST) request to the client, and reestablishing the TCP connection with the client connected with the failed node.
3. A system for implementing high availability of cluster storage, comprising the node apparatus of claim 2 and a storage device in a SAN cluster.
CN201210363576.5A 2012-09-26 2012-09-26 A kind of realize the method for cluster-based storage high availability, node apparatus and system Active CN102868754B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210363576.5A CN102868754B (en) 2012-09-26 2012-09-26 A kind of realize the method for cluster-based storage high availability, node apparatus and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210363576.5A CN102868754B (en) 2012-09-26 2012-09-26 A kind of realize the method for cluster-based storage high availability, node apparatus and system

Publications (2)

Publication Number Publication Date
CN102868754A CN102868754A (en) 2013-01-09
CN102868754B true CN102868754B (en) 2016-08-03

Family

ID=47447340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210363576.5A Active CN102868754B (en) 2012-09-26 2012-09-26 A kind of realize the method for cluster-based storage high availability, node apparatus and system

Country Status (1)

Country Link
CN (1) CN102868754B (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103501338B (en) * 2013-09-30 2017-04-05 华为技术有限公司 A kind of lock restoration methods, equipment and NFS
CN103605616A (en) * 2013-11-21 2014-02-26 浪潮电子信息产业股份有限公司 Multi-controller cache data consistency guarantee method
CN103607311B (en) * 2013-11-29 2017-01-18 厦门市美亚柏科信息股份有限公司 System and method for reestablishing TCP connection seamlessly
CN103701906B (en) * 2013-12-27 2017-06-09 北京奇安信科技有限公司 Distributed real time computation system and its data processing method
CN105208069B (en) * 2014-06-27 2018-05-22 中国科学院上海生命科学研究院 Brain function network data cloud system
CN105430026A (en) * 2014-09-04 2016-03-23 中国石油化工股份有限公司 Cloud storage data synchronization method based on a plurality of control strategies
CN105471945A (en) * 2014-09-04 2016-04-06 中国石油化工股份有限公司 Application method of cloud storage in seismic integrated interpretation
CN104391654B (en) * 2014-11-06 2018-02-06 浪潮(北京)电子信息产业有限公司 A kind of shared disk management method and system
CN105337762A (en) * 2015-09-28 2016-02-17 浪潮(北京)电子信息产业有限公司 File sharing method supporting automatic failover
CN105592139B (en) * 2015-10-28 2019-03-15 新华三技术有限公司 A kind of the HA implementation method and device of distributed file system management platform
CN105471622B (en) * 2015-11-12 2019-03-01 武汉噢易云计算股份有限公司 A kind of high availability method and system of the control node active-standby switch based on Galera
CN105516252A (en) * 2015-11-26 2016-04-20 华为技术有限公司 TCP (Transmission Control Protocol) connection switching method, apparatus and system
CN105337780B (en) * 2015-12-01 2018-09-18 迈普通信技术股份有限公司 A kind of server node configuration method and physical node
CN107231246A (en) * 2016-03-23 2017-10-03 北京佳讯飞鸿电气股份有限公司 A kind of server access system and method
CN105930103B (en) * 2016-05-10 2019-04-16 南京大学 A kind of correcting and eleting codes covering write method of distributed storage CEPH
CN108268210B (en) * 2016-12-30 2022-03-08 中移(苏州)软件技术有限公司 Information processing method, computing node and storage node
CN106951444A (en) * 2017-02-17 2017-07-14 深圳市嘉力达节能科技股份有限公司 Architectural engineering information processing method and device
CN107454165A (en) * 2017-08-04 2017-12-08 郑州云海信息技术有限公司 Access method and device of a kind of hadoop cluster to ceph clusters
CN107566182A (en) * 2017-09-14 2018-01-09 郑州云海信息技术有限公司 The adapting method and system of a kind of NFS
CN109726600B (en) * 2017-10-31 2023-07-14 伊姆西Ip控股有限责任公司 System and method for providing data protection for super fusion infrastructure
CN111209260A (en) * 2019-12-30 2020-05-29 创新科技术有限公司 NFS cluster based on distributed storage and method for providing NFS service
CN111586138B (en) * 2020-04-30 2022-10-21 中国工商银行股份有限公司 Job processing method, device and system and electronic equipment
CN111787113B (en) * 2020-07-03 2021-09-03 北京大道云行科技有限公司 Node fault processing method and device, storage medium and electronic equipment
CN111949452B (en) * 2020-09-18 2022-09-20 苏州浪潮智能科技有限公司 Method and device for rapidly recovering IO (input/output) in single-node fault of storage system
CN112104513B (en) * 2020-11-02 2021-02-12 武汉中科通达高新技术股份有限公司 Visual software load method, device, equipment and storage medium
CN113472566A (en) * 2021-06-11 2021-10-01 北京市大数据中心 Status monitoring method of union block chain and master node status monitoring system
CN115134219A (en) * 2022-06-29 2022-09-30 北京飞讯数码科技有限公司 Device resource management method and device, computing device and storage medium
CN116155686B (en) * 2023-01-30 2024-05-31 浪潮云信息技术股份公司 Method for judging node faults in cloud environment
CN118035199B (en) * 2024-01-12 2024-11-19 湖南国科亿存信息科技有限公司 NFS server control method and device for preventing abnormal reading and writing during high availability switching
CN118069376B (en) * 2024-04-18 2024-07-02 北京大道云行科技有限公司 Multi-tenant high-availability system based on SAN storage

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101651559A (en) * 2009-07-13 2010-02-17 浪潮电子信息产业股份有限公司 Failover method of storage service in double controller storage system
CN102340530A (en) * 2010-07-26 2012-02-01 杭州信核数据科技有限公司 Method and system for memory space take-over and data migration
CN102413172A (en) * 2011-10-31 2012-04-11 北京联创信安科技有限公司 Parallel data sharing method based on cluster technology and apparatus thereof
CN102571904A (en) * 2011-10-11 2012-07-11 浪潮电子信息产业股份有限公司 Construction method of NAS cluster system based on modularization design
CN102664757A (en) * 2012-04-25 2012-09-12 浙江宇视科技有限公司 Cascading method and equipment for storage devices

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1254748C (en) * 2003-10-31 2006-05-03 清华大学 Method for accessing distributed and virtualized storage in local network
CN101179432A (en) * 2007-12-13 2008-05-14 浪潮电子信息产业股份有限公司 A Method of Realizing System High Availability in Multi-machine Environment
CN102655460B (en) * 2012-01-05 2014-09-24 中国工商银行股份有限公司 Redundancy backup method and system of production server

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101651559A (en) * 2009-07-13 2010-02-17 浪潮电子信息产业股份有限公司 Failover method of storage service in double controller storage system
CN102340530A (en) * 2010-07-26 2012-02-01 杭州信核数据科技有限公司 Method and system for memory space take-over and data migration
CN102571904A (en) * 2011-10-11 2012-07-11 浪潮电子信息产业股份有限公司 Construction method of NAS cluster system based on modularization design
CN102413172A (en) * 2011-10-31 2012-04-11 北京联创信安科技有限公司 Parallel data sharing method based on cluster technology and apparatus thereof
CN102664757A (en) * 2012-04-25 2012-09-12 浙江宇视科技有限公司 Cascading method and equipment for storage devices

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《海量存储文件系统Castor FS的设计与实现》;陈志凌 等;《计算机工程与应用》;20060430(第4期);第108-110页 *

Also Published As

Publication number Publication date
CN102868754A (en) 2013-01-09

Similar Documents

Publication Publication Date Title
CN102868754B (en) A kind of realize the method for cluster-based storage high availability, node apparatus and system
US11075795B2 (en) Arbitration method, apparatus, and system used in active-active data centers
KR101513863B1 (en) Method and system for network element service recovery
US7386610B1 (en) Internet protocol data mirroring
EP1963985B1 (en) System and method for enabling site failover in an application server environment
CN102932210B (en) Method and system for monitoring node in PaaS cloud platform
EP3210367B1 (en) System and method for disaster recovery of cloud applications
CN101741536B (en) Data level disaster-tolerant method and system and production center node
EP1955506B1 (en) Methods, systems, and computer program products for session initiation protocol (sip) fast switchover
WO2013043439A1 (en) Storage area network attached clustered storage system
WO2016202051A1 (en) Method and device for managing active and backup nodes in communication system and high-availability cluster
JP2014026321A (en) Storage device, information processing device, information processing system, access control method, and access control program
JPWO2020044934A1 (en) Communication equipment, methods, and programs
CN115801642B (en) RDMA communication management module, method, equipment and medium based on state control
EP3167372B1 (en) Methods for facilitating high availability storage services and corresponding devices
CN109474694A (en) A management and control method and device for a NAS cluster based on a SAN storage array
JP2005301436A (en) Cluster system and failure recovery method in cluster system
CN113326100B (en) Cluster management method, device, equipment and computer storage medium
CN117201507A (en) Cloud platform switching method and device, electronic equipment and storage medium
CN114301763A (en) Distributed cluster fault processing method and system, electronic device and storage medium
CN102281159A (en) Recovery method of cluster system
JP2013161266A (en) Redundant control system for call processing information, and backup maintenance server for the same
CN114629778A (en) A kind of IP multimedia service fault processing method, electronic device and storage medium
CN111414411A (en) High availability database system
CN110890989A (en) Channel connection method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100085 No. 1, building 3, building ten, No. 8, 813 street, Beijing, Haidian District

Applicant after: Beijing Lianchuang Xinan Technology Co., Ltd.

Address before: 100085, room 712, building 7, block D, Jinyu Ka Wah building, No. 9, 3rd Street, Haidian District, Beijing

Applicant before: Beijing Lianchuang Xinan Technology Co.,Ltd.

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant