[go: up one dir, main page]

CN115914255B - Repeated frame control method, terminal and storage medium for cluster communication in storage system - Google Patents

Repeated frame control method, terminal and storage medium for cluster communication in storage system

Info

Publication number
CN115914255B
CN115914255B CN202211324133.5A CN202211324133A CN115914255B CN 115914255 B CN115914255 B CN 115914255B CN 202211324133 A CN202211324133 A CN 202211324133A CN 115914255 B CN115914255 B CN 115914255B
Authority
CN
China
Prior art keywords
cmnd
initiator
repeated frame
ids
repeated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211324133.5A
Other languages
Chinese (zh)
Other versions
CN115914255A (en
Inventor
张珠玉
张璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202211324133.5A priority Critical patent/CN115914255B/en
Publication of CN115914255A publication Critical patent/CN115914255A/en
Application granted granted Critical
Publication of CN115914255B publication Critical patent/CN115914255B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Computer And Data Communications (AREA)

Abstract

本发明涉及存储技术领域,具体涉及存储系统中集群通信的重复帧控制方法、终端及存储介质。该方法初始化生成节点id为行、cmnd cid为列的全局二维数组(initiator_duplicated_ids和target_duplicated_ids),用以记录每个cmnd的重复帧id;其中,节点之间保持双链路或多链路冗余连;在CL层initiator端发起IO申请,CL层target端接收IO申请,且target端将反馈结果返回给initiator端;其中,target端对空IO申请做重复帧判断;如果重复帧判断失败,则中止IO流程,UI告警,并向initiator反馈使其重置归0;如果存在其他异常情况,同样也有重置归0或回退一步的操作,保证后续IO能正常进行。本发明既能检测集群通信异常情况,又能保证大部分IO正常进行,实现存储系统集群通信的稳定性、可靠性。

The present invention relates to the field of storage technology, and more particularly to a duplicate frame control method, terminal, and storage medium for cluster communication in a storage system. The method initializes and generates a global two-dimensional array (initiator_duplicated_ids and target_duplicated_ids) with node IDs as rows and cmnd CIDs as columns, for recording the duplicate frame IDs of each cmnd; wherein dual-link or multi-link redundant connections are maintained between nodes; an IO request is initiated at the CL layer initiator end, the CL layer target end receives the IO request, and the target end returns the feedback result to the initiator end; wherein the target end performs a duplicate frame judgment on an empty IO request; if the duplicate frame judgment fails, the IO process is terminated, a UI alarm is issued, and feedback is sent to the initiator to reset it to 0; if other abnormal conditions exist, a reset to 0 or step back operation is also performed to ensure that subsequent IO can proceed normally. The present invention can detect cluster communication abnormalities and ensure that most IOs proceed normally, thereby achieving stability and reliability of cluster communication in the storage system.

Description

Repeated frame control method, terminal and storage medium for cluster communication in storage system
Technical Field
The present invention relates to the field of storage technologies, and in particular, to a method, a terminal, and a storage medium for controlling repeated frames of trunking communication in a storage system.
Background
In a unified storage system, clusters are built through a plurality of controllers, and mass data are transmitted among cluster nodes through cache synchronization, mirror image mapping, redundancy backup and other functional modules. Besides the requirement of high concurrent high bandwidth communication performance, the storage system also ensures the safety and reliability of communication, so that a repeated frame control mechanism is designed to solve the repeated frame problem of cluster communication of the storage system.
The repeated frame control mechanism realizes the generation and circulation of the repeated frame count value along with the IO flow. The method can display alarm information on the UI interface, promote manual maintenance and transformation, repair the information by oneself, handle various abnormal conditions and ensure normal operation of cluster communication. According to the UI display information, specific communication ports and physical lines can be arranged and positioned, and whether physical problems such as optical fiber cables, network cards and network faults exist or not is analyzed. Besides the control detection of the traditional fiber channel frame, the extended and compatible ntb link frame, the ip link frame and the rdma link frame are unified into a set of control mechanism. The method is suitable for various protocol models and various hardware link types, can cope with more complex field environments, and contributes to the targets of high performance, high availability, high reliability and the like of the storage system.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a repeated frame control method, a terminal and a storage medium for cluster communication in a storage system.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
In a first aspect, in one embodiment of the present invention, there is provided a method for controlling repeated frames of a trunked communication in a storage system, the method comprising the steps of:
Initializing a global two-dimensional array (initiator_ duplicated _ids and target_ duplicated _ids) with node ids being rows and cmnd cid being columns to record the repeated frame ids of each cmnd;
wherein, the nodes keep double-link or multi-link redundant connection;
Initiating an IO application at a CL layer initiator terminal, receiving the IO application at a CL layer target terminal, and returning a feedback result to the initiator terminal by the target terminal;
the target end judges repeated frames of the empty IO application;
if the repeated frame judgment fails, stopping the IO flow, giving an alarm by the UI, and feeding back to the initiator to reset the initiator to 0;
If other abnormal conditions exist, the reset to 0 or the rollback operation is also performed, so that the subsequent IO can be normally performed.
As a further scheme of the invention, each link has concurrent IO with the quantity of credit_max, each IO is embodied by a protocol command cmnd structure, each cmnd is distributed from a cmnd resource pool, and each cmnd has unique cid.
As a further scheme of the invention, when a link is accidentally disconnected, all cmnd which send out no feedback are traversed through the link initiator, and the corresponding repeated frame count value in the initiator_ duplicated _ids table is reset to 0.
As a further scheme of the invention, when all links between two nodes are disconnected, namely the nodes are out of connection, all elements of which the corresponding node ids in the initiator_ duplicated _ids table are rows are reset to 0 at the initiator end, and after the node reconnection is successful, the relevant repeated frame count value completely starts from 0.
As a further scheme of the invention, the IO application is initiated at the CL layer initiator, and the specific process is as follows:
at cmnd, the resource pool calls for a free cmnd, obtains the cmnd repeated frame count value from the initiator_ duplicated _ids, assigns the repeated frame count value to the corresponding field cmnd, and sends the repeated frame count value to the correspondent node.
As a further scheme of the present invention, the CL layer target receives an IO application, and the specific process is:
Acquiring a repeated frame count value of the cid from the target_ duplicated _ids array, comparing the repeated frame count value with the repeated frame count value of the corresponding field received cmnd, and if the repeated frame count value is equal to the repeated frame count value of the corresponding field received cmnd, continuing IO flow;
if the matching fails, the target actively performs the function of the IO, marks the function of the feedback result, and sends out repeated frame alarms to the UI interface.
As a further aspect of the present invention, the UI interface issues a repeated frame alert, further including:
The alarm information indicates the specific physical link wwpn value to assist in locating the physical device and guiding subsequent troubleshooting maintenance work.
As a further scheme of the invention, the initiator receives a feedback result, if the initiator is provided with an abart mark, the cmnd repeated frame count value under the initiator_ duplicated _ids is reset to 0, and when the initiator is called next time, the value is 0.
In a second aspect, in yet another embodiment provided by the present invention, a computer device is provided, including a memory storing a computer program and a processor implementing steps of a repeated frame control method of cluster communication in a storage system when the computer program is loaded and executed by the processor.
In a third aspect, in yet another embodiment of the present invention, a storage medium is provided, storing a computer program, which when loaded and executed by a processor, implements the steps of a repeated frame control method of trunking communication in the storage system.
The technical scheme provided by the invention has the following beneficial effects:
According to the repeated frame control method, the terminal and the storage medium for cluster communication in the storage system, a global two-dimensional array taking node id as a row and cid as a column is designed, and each IO command is bound with a repeated frame count value, so that an initiator end and a target end are kept consistent. the target end makes repeated frame judgment to the blank IO application, and normal data transmission is not affected. If the repeated frame judgment fails, the IO flow is stopped, the UI alarms, and feedback is given to the initiator to reset the initiator to 0. If other abnormal conditions exist, the reset to 0 or the rollback operation is also performed, so that the subsequent IO can be normally performed. The invention can detect the abnormal condition of cluster communication, ensure that most IO is normally performed, and realize the stability and reliability of the cluster communication of the storage system.
These and other aspects of the invention will be more readily apparent from the following description of the embodiments. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method of repeated frame control for trunking communications in a storage system in accordance with one embodiment of the invention;
FIG. 2 is a schematic diagram of cluster IO communications in accordance with one embodiment of the present invention;
fig. 3 is a block diagram of a terminal according to an embodiment of the present invention.
In the figure, a processor-301, a communication interface-302, a memory-303, and a communication bus-304.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In particular, embodiments of the present invention are further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, fig. 1 is a flowchart of a method for controlling repeated frames of trunking communication in a storage system according to an embodiment of the invention, as shown in fig. 1, the method for controlling repeated frames of trunking communication in the storage system includes steps S10 to S20.
S10, initializing a global two-dimensional array (initiator_ duplicated _ids and target_ duplicated _ids) with node ids being rows and cmnd cid being columns, so as to record repeated frame ids of each cmnd;
Wherein, the nodes keep dual-link or multi-link redundant connection.
In the embodiment of the invention, all element values of the two groups are initialized to be 0 by default. During subsequent cmnd uses, the element values of the corresponding two-dimensional array change.
S20, initiating an IO application at a CL layer initiator end, receiving the IO application at a CL layer target end, and returning a feedback result to the initiator end by the target end;
the target end judges repeated frames of the empty IO application;
if the repeated frame judgment fails, stopping the IO flow, giving an alarm by the UI, and feeding back to the initiator to reset the initiator to 0;
If other abnormal conditions exist, the reset to 0 or the rollback operation is also performed, so that the subsequent IO can be normally performed.
Wherein, each link has concurrent IO with the number of credit_max (different link types and different credit_max). Each IO concrete form is a protocol command cmnd fabric embodiment, while each cmnd is allocated from a cmnd resource pool, each cmnd has a unique cid, cmnd being recyclable.
In the embodiment of the present invention, when a link is accidentally disconnected, all cmnd that send out no feedback from the link initiator end are traversed, and the corresponding repeated frame count value in the initiator_ duplicated _ids table is reset to 0. That is, the next time this cmnd is reused, the repeated frame count value starts from 0, and target repeated frame detection is completed. The repetition frame count value of the other cmnd remains unchanged because the values at the initiator and target ends are the same.
In the embodiment of the invention, when all links between two nodes are disconnected, namely the nodes are out of connection, all elements of which the corresponding node ids in the initiator_ duplicated _ids table are rows are reset to 0 at the initiator end. After successful reconnection of the node, the relevant repeated frame count value starts entirely from 0.
In the embodiment of the invention, the initiating IO application at the CL layer initiator comprises the following specific processes:
At cmnd, the resource pool calls for a free cmnd, obtains the cmnd repeated frame count value from the initiator_ duplicated _ids, assigns the repeated frame count value to the corresponding field cmnd, and sends the repeated frame count value to the correspondent node. The cmnd repeated frame count value in the initiator_ duplicated _ids array is self-incremented for the next use.
In the embodiment of the present invention, the CL layer target receives an IO application, and the specific process is:
The repeated frame count value of the cid is obtained in the target_ duplicated _ids array, compared with the repeated frame count value of the corresponding field received cmnd, and if the repeated frame count value of the IO application cmnd is equal (if the repeated frame count value of the IO application cmnd is 0, the matching is considered successful, and the value in the target_ dpulicate _ids array is reset to be 0), the IO flow continues. the value of the target_ dpulicate _ids array self-increases, keeping synchronization with the initiator side. If the matching fails, the target actively performs the function of the IO, marks the function of the feedback result, and sends out repeated frame alarms to the UI interface.
In an embodiment of the present invention, the UI interface issues a repeated frame alert, and further includes:
The alarm information indicates the specific physical link wwpn value to assist in locating the physical device and guiding subsequent troubleshooting maintenance work.
In the embodiment of the invention, the initiator receives the feedback result, and if the initiator is marked with an abart, the cmnd repeated frame count value is reset to 0 under the initiator_ duplicated _ids. And when the call is next called, the value 0 is taken.
In an embodiment of the present invention, the other abnormal conditions include:
the common interface platform feeds back the failure result, namely the initiator sends out IO request, the opposite end responds through the bottom layer processing, but failure feedback is obtained, and the repeat frame count value of cmnd is reset to 0;
When the received data is incomplete, and the frame loss condition exists, that is, the size of the received data is inconsistent with the size of the data sent by the target, resetting the repeated frame count value of cmnd to 0;
When the bottom layer is processed, the IO is blocked by the SAN network and cannot be normally sent out, namely the cmnd leaves the CL layer and reaches the public interface platform layer and the driving layer, but the IO is not sent out any more due to physical reasons and is directly fed back to the CL layer, the cmnd repeated frame count value is reduced by one, the IO is invalid, and the cmnd above value transmission is called next time;
other status value errors, i.e., assignment to cmnd status by the partner target or the underlying driver, indicate an abnormal condition, the repeat frame count value of cmnd is reset to 0.
The invention should not detect repeated link channel frames all the time. Generating a duplicate frame alert indication, receiving a duplicate link channel frame, or an erroneous link channel frame, or missing a portion of the link channel frame, indicating that the link channel or communication network is problematic. Other errors may be caused in connection with the link channel communication network.
The particular communication link that generated the duplicate frame may be determined using the transmit and receive WWPNs indicated in the error data. Network monitoring tools are used to find the cause of the problem. Design errors, configuration errors, or software or hardware failures of one of the components of the link channel communication network (including inter-switch links) in the cluster communication network topology may cause problems in generating repeated frames, erroneous frames, lost frames, etc.
The invention designs a global two-dimensional array initiator_ duplicated _ids and target_ duplicated _ids which take a connection node id and a command cid as indexes, and records the repeated frame count value of each command cmnd. The global array records the numerical value to replace a single count value, so that repeated frame detection can be realized, and concurrency of multiple queues and multiple IO (the concurrency IO is carried by different cmnd). When an initiator initiates IO, binding a repeated value count value of the cid to cmnd (at the moment, the repeated frame count values at the two ends of the initiator and the target are equal), and when the target receives the IO to carry out repeated frame verification, if matching is successful, continuing the IO flow, and if the matching is failed, carrying out warning to prompt manual maintenance processing. Other IOs can normally flow except abnormal IO alarms by utilizing reset return-to-0 and self-subtracting operations of an initiator terminal, and cmnd after reset can also be used continuously. The control mechanism is applicable to rdma link of NVME protocol in addition to ntb link, fc link, ip link of SCSI protocol. And the repeated frame control mechanism detects repeated frames of a single command in a cluster communication environment of the storage system, ensures IO concurrency, meets the requirements of improving IOPS and throughput, and realizes the stability and reliability of the whole system.
It should be understood that although described in a certain order, the steps are not necessarily performed sequentially in the order described. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, some steps of the present embodiment may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the steps or stages in other steps or other steps.
In one embodiment, referring to fig. 3, a computer device is further provided in an embodiment of the present invention, including a processor 301, a communication interface 302, a memory 303, and a communication bus 304, where the processor 301, the communication interface 302, and the memory 303 perform communication with each other through the communication bus 304.
A memory 303 for storing a computer program;
the processor 301, when executing the computer program stored in the memory 303, performs the subject method, and when executing the instructions, the processor implements the steps in the method embodiments described above:
s10, initializing a global two-dimensional array (initiator_ duplicated _ids and target_ duplicated _ids) with node ids being rows and cmnd cid being columns, so as to record repeated frame ids of each cmnd;
Wherein, the nodes keep dual-link or multi-link redundant connection.
In the embodiment of the invention, all element values of the two groups are initialized to be 0 by default. During subsequent cmnd uses, the element values of the corresponding two-dimensional array change.
S20, initiating an IO application at a CL layer initiator end, receiving the IO application at a CL layer target end, and returning a feedback result to the initiator end by the target end;
the target end judges repeated frames of the empty IO application;
if the repeated frame judgment fails, stopping the IO flow, giving an alarm by the UI, and feeding back to the initiator to reset the initiator to 0;
If other abnormal conditions exist, the reset to 0 or the rollback operation is also performed, so that the subsequent IO can be normally performed.
Wherein, each link has concurrent IO with the number of credit_max (different link types and different credit_max). Each IO concrete form is a protocol command cmnd fabric embodiment, while each cmnd is allocated from a cmnd resource pool, each cmnd has a unique cid, cmnd being recyclable.
In the embodiment of the present invention, when a link is accidentally disconnected, all cmnd that send out no feedback from the link initiator end are traversed, and the corresponding repeated frame count value in the initiator_ duplicated _ids table is reset to 0. That is, the next time this cmnd is reused, the repeated frame count value starts from 0, and target repeated frame detection is completed. The repetition frame count value of the other cmnd remains unchanged because the values at the initiator and target ends are the same.
In the embodiment of the invention, when all links between two nodes are disconnected, namely the nodes are out of connection, all elements of which the corresponding node ids in the initiator_ duplicated _ids table are rows are reset to 0 at the initiator end. After successful reconnection of the node, the relevant repeated frame count value starts entirely from 0.
In the embodiment of the invention, the initiating IO application at the CL layer initiator comprises the following specific processes:
At cmnd, the resource pool calls for a free cmnd, obtains the cmnd repeated frame count value from the initiator_ duplicated _ids, assigns the repeated frame count value to the corresponding field cmnd, and sends the repeated frame count value to the correspondent node. The cmnd repeated frame count value in the initiator_ duplicated _ids array is self-incremented for the next use.
In the embodiment of the present invention, the CL layer target receives an IO application, and the specific process is:
The repeated frame count value of the cid is obtained in the target_ duplicated _ids array, compared with the repeated frame count value of the corresponding field received cmnd, and if the repeated frame count value of the IO application cmnd is equal (if the repeated frame count value of the IO application cmnd is 0, the matching is considered successful, and the value in the target_ dpulicate _ids array is reset to be 0), the IO flow continues. the value of the target_ dpulicate _ids array self-increases, keeping synchronization with the initiator side. If the matching fails, the target actively performs the function of the IO, marks the function of the feedback result, and sends out repeated frame alarms to the UI interface.
In an embodiment of the present invention, the UI interface issues a repeated frame alert, and further includes:
The alarm information indicates the specific physical link wwpn value to assist in locating the physical device and guiding subsequent troubleshooting maintenance work.
In the embodiment of the invention, the initiator receives the feedback result, and if the initiator is marked with an abart, the cmnd repeated frame count value is reset to 0 under the initiator_ duplicated _ids. And when the call is next called, the value 0 is taken.
In an embodiment of the present invention, the other abnormal conditions include:
the common interface platform feeds back the failure result, namely the initiator sends out IO request, the opposite end responds through the bottom layer processing, but failure feedback is obtained, and the repeat frame count value of cmnd is reset to 0;
When the received data is incomplete, and the frame loss condition exists, that is, the size of the received data is inconsistent with the size of the data sent by the target, resetting the repeated frame count value of cmnd to 0;
When the bottom layer is processed, the IO is blocked by the SAN network and cannot be normally sent out, namely the cmnd leaves the CL layer and reaches the public interface platform layer and the driving layer, but the IO is not sent out any more due to physical reasons and is directly fed back to the CL layer, the cmnd repeated frame count value is reduced by one, the IO is invalid, and the cmnd above value transmission is called next time;
other status value errors, i.e., assignment to cmnd status by the partner target or the underlying driver, indicate an abnormal condition, the repeat frame count value of cmnd is reset to 0.
The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PERIPHERAL COMPONENTINTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry StandardArchitecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor including a central Processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a digital signal processor (DIGITAL SIGNAL Processing, DSP), an application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), a Field-Programmable GATE ARRAY, FPGA, or other Programmable logic device, discrete gate or transistor logic device, or discrete hardware components.
The computer device includes a user device and a network device. The user equipment comprises, but is not limited to, a computer, a smart phone, a PDA and the like, and the network equipment comprises, but is not limited to, a single network server, a server group formed by a plurality of network servers or Cloud Computing (Cloud Computing) based Cloud formed by a large number of computers or network servers, wherein the Cloud Computing is one of distributed Computing and is a super virtual computer formed by a group of loosely coupled computer sets. The computer device can be used for realizing the invention by running alone, and can also be accessed into a network and realized by interaction with other computer devices in the network. Wherein the network where the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like. It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
In one embodiment of the present invention there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method embodiments described above:
s10, initializing a global two-dimensional array (initiator_ duplicated _ids and target_ duplicated _ids) with node ids being rows and cmnd cid being columns, so as to record repeated frame ids of each cmnd;
Wherein, the nodes keep dual-link or multi-link redundant connection.
In the embodiment of the invention, all element values of the two groups are initialized to be 0 by default. During subsequent cmnd uses, the element values of the corresponding two-dimensional array change.
S20, initiating an IO application at a CL layer initiator end, receiving the IO application at a CL layer target end, and returning a feedback result to the initiator end by the target end;
the target end judges repeated frames of the empty IO application;
if the repeated frame judgment fails, stopping the IO flow, giving an alarm by the UI, and feeding back to the initiator to reset the initiator to 0;
If other abnormal conditions exist, the reset to 0 or the rollback operation is also performed, so that the subsequent IO can be normally performed.
Wherein, each link has concurrent IO with the number of credit_max (different link types and different credit_max). Each IO concrete form is a protocol command cmnd fabric embodiment, while each cmnd is allocated from a cmnd resource pool, each cmnd has a unique cid, cmnd being recyclable.
In the embodiment of the present invention, when a link is accidentally disconnected, all cmnd that send out no feedback from the link initiator end are traversed, and the corresponding repeated frame count value in the initiator_ duplicated _ids table is reset to 0. That is, the next time this cmnd is reused, the repeated frame count value starts from 0, and target repeated frame detection is completed. The repetition frame count value of the other cmnd remains unchanged because the values at the initiator and target ends are the same.
In the embodiment of the invention, when all links between two nodes are disconnected, namely the nodes are out of connection, all elements of which the corresponding node ids in the initiator_ duplicated _ids table are rows are reset to 0 at the initiator end. After successful reconnection of the node, the relevant repeated frame count value starts entirely from 0.
In the embodiment of the invention, the initiating IO application at the CL layer initiator comprises the following specific processes:
At cmnd, the resource pool calls for a free cmnd, obtains the cmnd repeated frame count value from the initiator_ duplicated _ids, assigns the repeated frame count value to the corresponding field cmnd, and sends the repeated frame count value to the correspondent node. The cmnd repeated frame count value in the initiator_ duplicated _ids array is self-incremented for the next use.
In the embodiment of the present invention, the CL layer target receives an IO application, and the specific process is:
The repeated frame count value of the cid is obtained in the target_ duplicated _ids array, compared with the repeated frame count value of the corresponding field received cmnd, and if the repeated frame count value of the IO application cmnd is equal (if the repeated frame count value of the IO application cmnd is 0, the matching is considered successful, and the value in the target_ dpulicate _ids array is reset to be 0), the IO flow continues. the value of the target_ dpulicate _ids array self-increases, keeping synchronization with the initiator side. If the matching fails, the target actively performs the function of the IO, marks the function of the feedback result, and sends out repeated frame alarms to the UI interface.
In an embodiment of the present invention, the UI interface issues a repeated frame alert, and further includes:
The alarm information indicates the specific physical link wwpn value to assist in locating the physical device and guiding subsequent troubleshooting maintenance work.
In the embodiment of the invention, the initiator receives the feedback result, and if the initiator is marked with an abart, the cmnd repeated frame count value is reset to 0 under the initiator_ duplicated _ids. And when the call is next called, the value 0 is taken.
Other abnormal conditions, including:
the common interface platform feeds back the failure result, namely the initiator sends out IO request, the opposite end responds through the bottom layer processing, but failure feedback is obtained, and the repeat frame count value of cmnd is reset to 0;
When the received data is incomplete, and the frame loss condition exists, that is, the size of the received data is inconsistent with the size of the data sent by the target, resetting the repeated frame count value of cmnd to 0;
When the bottom layer is processed, the IO is blocked by the SAN network and cannot be normally sent out, namely the cmnd leaves the CL layer and reaches the public interface platform layer and the driving layer, but the IO is not sent out any more due to physical reasons and is directly fed back to the CL layer, the cmnd repeated frame count value is reduced by one, the IO is invalid, and the cmnd above value transmission is called next time;
other status value errors, i.e., assignment to cmnd status by the partner target or the underlying driver, indicate an abnormal condition, the repeat frame count value of cmnd is reset to 0.
Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the above described embodiment methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
It will be appreciated by persons skilled in the art that the foregoing discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples, that technical features of the above embodiments or different embodiments may be combined and that many other variations of the different aspects of the embodiments of the invention as described above exist within the spirit of the embodiments of the invention, which are not provided in detail for clarity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (9)

1. The repeated frame control method for cluster communication in a storage system is characterized by comprising the steps of initializing global two-dimensional array initiator_ duplicated _ids and target_ duplicated _ids with node ids as rows and cmnd cid as columns for recording the repeated frame ids of each cmnd;
wherein, the nodes keep double-link or multi-link redundant connection;
Initiating an IO application at a CL layer initiator terminal, receiving the IO application at a CL layer target terminal, and returning a feedback result to the initiator terminal by the target terminal;
the target end judges repeated frames of the empty IO application;
if the repeated frame judgment fails, stopping the IO flow, giving an alarm by the UI, and feeding back to the initiator to reset the initiator to 0;
if other abnormal conditions exist, resetting to 0 or backing one step is also performed, so that the subsequent IO can be normally performed;
the CL layer target receives an IO application, and the specific process is as follows:
Acquiring a repeated frame count value of the cid from the target_ duplicated _ids array, comparing the repeated frame count value with the repeated frame count value of the corresponding field received cmnd, and if the repeated frame count value is equal to the repeated frame count value of the corresponding field received cmnd, continuing IO flow;
if the matching fails, the target actively performs the function of the IO, marks the function of the feedback result, and sends out repeated frame alarms to the UI interface;
The other abnormal conditions comprise feedback failure of the common interface platform, abnormal data transmission integrity, abnormal physical transmission blocking and abnormal state value.
2. The method of claim 1, wherein each link has a concurrent IO of a credit_max number, each IO in the form of a protocol command cmnd structure, each cmnd is allocated from a cmnd resource pool, and each cmnd has a unique cid.
3. The method for controlling repeated frames in trunking communication in a storage system according to claim 2, wherein when a link is accidentally disconnected, all the cmnd sending out no feedback from the link initiator are traversed, and the corresponding repeated frame count value in the initiator_ duplicated _ids table is reset to 0.
4. The method for controlling repeated frames of trunking communication in a storage system according to claim 2, wherein when all links between two nodes are disconnected, i.e. the nodes are not connected, all elements with the corresponding node id as a row in the initiator_ duplicated _ids table are reset to 0 at the initiator end, and after the nodes are successfully reconnected, the relevant repeated frame count value completely starts from 0.
5. The method for controlling repeated frames of trunking communication in a storage system according to claim 1, wherein the initiating of the IO application at the CL layer initiator comprises the following specific steps:
at cmnd, the resource pool calls for a free cmnd, obtains the cmnd repeated frame count value from the initiator_ duplicated _ids, assigns the repeated frame count value to the corresponding field cmnd, and sends the repeated frame count value to the correspondent node.
6. The method for controlling repeated frames in a storage system according to claim 1, wherein the UI interface issues a repeated frame alarm, further comprising:
The alarm information indicates the specific physical link wwpn value to assist in locating the physical device and guiding subsequent troubleshooting maintenance work.
7. The method for controlling repeated frames of trunking communication in a memory system according to claim 1, wherein,
And if the initiator receives the feedback result and has an abacus mark, resetting the cmnd repeated frame count value to 0 under the initiator_ duplicated _ids, and taking the value of 0 when the initiator is called next time.
8. A computer device comprising a memory storing a computer program and a processor implementing the steps of the repeated frame control method of trunking communication in a storage system according to any one of claims 1 to 7 when the computer program is loaded and executed.
9. A storage medium storing a computer program which, when loaded and executed by a processor, implements the steps of the repeated frame control method of trunking communication in a storage system according to any one of claims 1 to 7.
CN202211324133.5A 2022-10-27 2022-10-27 Repeated frame control method, terminal and storage medium for cluster communication in storage system Active CN115914255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211324133.5A CN115914255B (en) 2022-10-27 2022-10-27 Repeated frame control method, terminal and storage medium for cluster communication in storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211324133.5A CN115914255B (en) 2022-10-27 2022-10-27 Repeated frame control method, terminal and storage medium for cluster communication in storage system

Publications (2)

Publication Number Publication Date
CN115914255A CN115914255A (en) 2023-04-04
CN115914255B true CN115914255B (en) 2025-08-01

Family

ID=86475138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211324133.5A Active CN115914255B (en) 2022-10-27 2022-10-27 Repeated frame control method, terminal and storage medium for cluster communication in storage system

Country Status (1)

Country Link
CN (1) CN115914255B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109314655A (en) * 2016-06-10 2019-02-05 Tt柔性技术有限公司 The receiving frame at the redundancy port for connecting the node to communication network
CN114328317A (en) * 2021-11-30 2022-04-12 苏州浪潮智能科技有限公司 A method, device and medium for improving communication performance of a storage system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8010829B1 (en) * 2005-10-20 2011-08-30 American Megatrends, Inc. Distributed hot-spare storage in a storage cluster
CN113392053B (en) * 2021-06-25 2022-08-05 苏州浪潮智能科技有限公司 A storage system, a communication method and components

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109314655A (en) * 2016-06-10 2019-02-05 Tt柔性技术有限公司 The receiving frame at the redundancy port for connecting the node to communication network
CN114328317A (en) * 2021-11-30 2022-04-12 苏州浪潮智能科技有限公司 A method, device and medium for improving communication performance of a storage system

Also Published As

Publication number Publication date
CN115914255A (en) 2023-04-04

Similar Documents

Publication Publication Date Title
US6952766B2 (en) Automated node restart in clustered computer system
US12335088B2 (en) Implementing switchover operations between computing nodes
CN110807064B (en) Data recovery device in RAC distributed database cluster system
US9367412B2 (en) Non-disruptive controller replacement in network storage systems
US20120192006A1 (en) Methods and systems for improved storage replication management and service continuance in a computing enterprise
CN108512753B (en) A method and device for message transmission in a cluster file system
US20200112499A1 (en) Multiple quorum witness
CN108153622A (en) The method, apparatus and equipment of a kind of troubleshooting
CN105095001A (en) Virtual machine exception recovery method under distributed environment
JP2008542858A5 (en)
TW202006564A (en) Error detecting device and error detecting method for detecting failure of hierarchical system, computer readable recording medium, and computer program product
CN110275793A (en) Detection method and equipment for MongoDB data fragment cluster
CN106095618A (en) The method and system of data manipulation
US8055934B1 (en) Error routing in a multi-root communication fabric
JP7125602B2 (en) Data processing device and diagnostic method
CN110413686B (en) Data writing method, device, equipment and storage medium
CN113238893B (en) Disaster recovery system, method, computer equipment and medium for multiple data centers
CN112540873B (en) Disaster tolerance method and device, electronic equipment and disaster tolerance system
CN115914255B (en) Repeated frame control method, terminal and storage medium for cluster communication in storage system
CN107678891A (en) The dual control method, apparatus and readable storage medium storing program for executing of a kind of storage system
CN106452696A (en) Control system of server cluster
CN113596195B (en) Public IP address management method, device, main node and storage medium
JP7474168B2 (en) Monitoring system and fault monitoring method
CN106020975A (en) Data operation method, device and system
US11947431B1 (en) Replication data facility failure detection and failover automation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 215000 Building 9, No.1 guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province

Patentee after: Suzhou Yuannao Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: 215000 Building 9, No.1 guanpu Road, Guoxiang street, Wuzhong Economic Development Zone, Suzhou City, Jiangsu Province

Patentee before: SUZHOU LANGCHAO INTELLIGENT TECHNOLOGY Co.,Ltd.

Country or region before: China