[go: up one dir, main page]

CN120803374B - Storage systems and storage system clusters - Google Patents

Storage systems and storage system clusters

Info

Publication number
CN120803374B
CN120803374B CN202511299744.2A CN202511299744A CN120803374B CN 120803374 B CN120803374 B CN 120803374B CN 202511299744 A CN202511299744 A CN 202511299744A CN 120803374 B CN120803374 B CN 120803374B
Authority
CN
China
Prior art keywords
module
processor
storage system
electrically connected
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202511299744.2A
Other languages
Chinese (zh)
Other versions
CN120803374A (en
Inventor
孔维宾
吴常顺
李帅帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Metabrain Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Metabrain Intelligent Technology Co Ltd filed Critical Suzhou Metabrain Intelligent Technology Co Ltd
Priority to CN202511299744.2A priority Critical patent/CN120803374B/en
Publication of CN120803374A publication Critical patent/CN120803374A/en
Application granted granted Critical
Publication of CN120803374B publication Critical patent/CN120803374B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Hardware Redundancy (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a storage system and a storage system cluster, which can be applied to the technical fields of servers and storage. The storage system comprises a plurality of control nodes and at least one interconnection device, wherein the interconnection device comprises a first exchange module and a memory module, the first exchange module is respectively and electrically connected with the plurality of control nodes and the memory module, the first exchange module of the at least one interconnection device is mutually connected, and the first exchange module is used for writing first cache data provided by the control nodes into the memory module or providing second cache data stored in the memory module to the control nodes so that the plurality of control nodes can realize cache sharing through the at least one interconnection device.

Description

Storage system and storage system cluster
Technical Field
The present invention relates to the field of servers and storage technologies, and in particular, to a storage system and a storage system cluster.
Background
In the digital age of explosive growth of data volume, the performance, reliability and expansibility requirements of a storage system as a core infrastructure for data bearing and management are continuously rising. The current mainstream high-end storage equipment forms a multi-controller networking architecture, and a common two-control and four-control scheme can realize system deployment through topologies such as 'one frame and two-control', 'one frame and four-control', 'two frames and four-control', and the like so as to meet the concurrent access and load balancing requirements in a large-scale data processing scene.
In order to realize data sharing and cooperative work among multiple controllers, related technologies mainly adopt an NTB (Non-TRANSPARENT BRIDGE ) or ROCE (Remote Direct Memory Access Over Converged Ethernet, remote direct memory access based on Ethernet) scheme to construct an interconnection link, however, the scheme constructed based on NTB or ROCE has performance splicing, and cannot meet the comprehensive requirements of a new generation storage system on high-performance interconnection, centralized cache and low-cost power backup.
Disclosure of Invention
In view of the foregoing, the present invention provides a storage system and a storage system cluster.
The invention provides a storage system which comprises a plurality of control nodes and at least one interconnection device, wherein the interconnection device comprises a first exchange module and a memory module, the first exchange module is respectively and electrically connected with the control nodes and the memory module, the first exchange module of at least one interconnection device is mutually connected, the first exchange module is used for writing first cache data provided by the control nodes into the memory module or is used for providing second cache data stored in the memory module to the control nodes, and therefore the control nodes can realize cache sharing through the interconnection device.
Another aspect of the invention provides a storage system cluster comprising at least two storage systems as described above.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of embodiments of the present invention with reference to the accompanying drawings, in which.
FIG. 1A illustrates a schematic diagram of an example controller node.
FIG. 1B illustrates a schematic diagram of an example ROCE-based implementation of a four-control shared storage architecture.
FIG. 2 shows a schematic diagram of a memory system according to an embodiment of the invention.
FIG. 3 shows a schematic diagram of a memory system according to another embodiment of the invention.
Fig. 4 shows a schematic diagram of a memory system according to another embodiment of the invention.
Fig. 5 shows a schematic diagram of a memory system according to another embodiment of the invention.
Fig. 6 shows a schematic diagram of a data-landing scheme according to an embodiment of the present invention.
FIG. 7 shows a schematic diagram of a memory system according to another embodiment of the invention.
FIG. 8A shows a schematic diagram of a memory system according to another embodiment of the invention.
FIG. 8B shows a schematic diagram of a memory system according to another embodiment of the invention.
FIG. 9 shows a schematic diagram of a memory system according to another embodiment of the invention.
FIG. 10 shows a schematic diagram of a memory system according to another embodiment of the invention.
FIG. 11 illustrates a schematic diagram of a storage system cluster, in accordance with an embodiment of the invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a convention should be interpreted in accordance with the meaning of one of skill in the art having generally understood the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the digital age of explosive growth of data volume, the performance, reliability and expansibility requirements of a storage system as a core infrastructure for data bearing and management are continuously rising. The current mainstream high-end storage equipment forms a multi-controller networking architecture, and common two-control and four-control schemes can realize system deployment through topologies such as one frame and two-control, one frame and four-control, two frames and four-control, so as to meet the concurrent access and load balancing requirements in a large-scale data processing scene. One frame two control means that two controllers are inserted into one machine case or machine frame, and one frame four control, two frame four control and other structures are pushed by the above method.
In order to realize data sharing and cooperative work among multiple controllers, the related technology mainly adopts an NTB or ROCE scheme to construct an interconnection link. The following describes a multi-control shared memory architecture in the related art, taking ROCE as an example.
FIG. 1A illustrates a schematic diagram of an example controller node.
As shown in fig. 1A, a plurality of memories, a system disk, a central processor, and a plurality of ROCE chips may be disposed in a controller node, and the central processor may be respectively connected to the plurality of memories, the system disk, and the plurality of ROCE chips, and the controller node may be used to connect with other controller nodes through the plurality of ROCE chips.
FIG. 1B illustrates a schematic diagram of an example ROCE-based implementation of a four-control shared storage architecture.
As shown in fig. 1B, 4 controller nodes may form a four-control shared memory architecture having a mesh topology, each controller node may include 3 ROCE chips, and each ROCE chip of each controller node may be respectively connected to another ROCE chip of another controller node to implement interconnection between the 4 controller nodes.
However, the four-control shared storage architecture based on ROCE has significant drawbacks, on one hand, the complexity of the network topology is high, and each controller node needs to be additionally provided with a plurality of special chips, so that the difficulty of board layout and wiring is increased, and the hardware cost is greatly increased. On the other hand, the link delay problem is prominent, for example, in an example of a four-control shared memory architecture, when a tool is used for carrying out transmission delay test, the maximum delay of the architecture is found to be 5 μs when an 8kb data packet is transmitted, the maximum delay of the architecture can reach 8 μs when a 32kb data packet is transmitted, and the maximum delay of the architecture can reach 10 μs when a 36kb data packet is transmitted, so that the overall data processing performance is directly restricted by high delay, and the requirements of scenes with strict requirements on real-time performance such as finance and industrial control are difficult to meet.
Meanwhile, the security of cache data of the storage system is one of the core design indexes. In a multi-controller environment, if a certain controller node encounters an AC (ALTERNATING CURRENT ) power failure, it needs to be ensured that data which is not persistent in a cache can be written into a data disk in time, so that data loss is avoided. Related schemes commonly adopt designs in which BBUs (Battery Backup Unit, spare battery units) are independently configured on each controller node, namely each controller node is provided with a dedicated BBU, and after power is off, cache operation is maintained through battery power supply until data is dropped. Although the mode can realize data protection, obvious disadvantages exist, including hardware resource waste, cost superposition and the like. Specifically, the BBU with the same quantity is required to be configured under the multi-controller node architecture, so that the occupancy rate of the space inside the chassis is increased, the expansion of other components is limited, the repeated deployment of the BBU greatly increases the hardware cost of the whole machine, the cost performance of the product is reduced, and the trend of the storage system towards high-density and low-cost development is not met.
In order to solve the problems of memory capacity, sharing efficiency, and industry pain such as Input/Output (IO) delay, related enterprises have introduced CXL (Compute Express Link, computing fast interconnect) technology. The CXL technology is used as an open industry standard interconnection protocol, has the characteristics of low delay, high bandwidth, friendly memory sharing and the like, can realize seamless connection with peripheral equipment such as an FPGA (Field-Programmable gate array), a GPU (Graphics Processing Unit, a graphics processor) and the like, and has obvious application potential in the fields of servers and storage.
However, the multi-controller node sharing scheme of the current storage system has not fully utilized the advantages of the CXL technology, is still limited by the performance bottleneck of the conventional NTB/ROCE scheme and the cost problem of BBU configuration, and cannot meet the comprehensive requirements of the new generation storage system on high-performance interconnection, centralized caching and low-cost power backup, and an innovative architecture based on the CXL technology is needed to be proposed so as to break through the related design limitations.
In view of this, the embodiment of the invention provides a cache sharing scheme based on a three-layer architecture of control node-interconnection device-memory module, and the first switching module of the interconnection device is used for realizing centralized connection between the control node and the memory and cascade connection of multiple interconnection devices, so as to solve the problems of complexity, delay and expansibility of cache sharing of multiple control nodes. The embodiment of the invention provides a storage system and a storage system cluster, wherein the storage system comprises a plurality of control nodes and at least one interconnection device, the interconnection device comprises a first exchange module and a memory module, the first exchange module is respectively and electrically connected with the plurality of control nodes and the memory module, the first exchange module of the at least one interconnection device is mutually connected, the first exchange module is used for writing first cache data provided by the control nodes into the memory module or is used for providing second cache data stored in the memory module to the control nodes, so that the plurality of control nodes can realize cache sharing through the at least one interconnection device.
FIG. 2 shows a schematic diagram of a memory system according to an embodiment of the invention.
As shown in fig. 2, the storage system may include a plurality of control nodes 10 and at least one interconnect device 20.
The plurality of control nodes 10 may be represented as a plurality of control terminals of the storage system, respectively. The control node 10 is a core computing unit in the storage system, which is responsible for business IO processing, data read-write scheduling and local resource management, and is also an initiator and a consumer of cache data in a multi-controller sharing scheme.
The interconnect device 20 may be a federated shared-Storage Multi-controller (Jointly-shared Block/Object-Storage Multi-controller, JBOM). In the storage system, the interconnection device 20 may be a cache resource pooling center and a redundancy control center, which are core hubs for implementing the multi-controller sharing scheme.
The interconnect device 20 may include a first switch module 21 and a memory module 22, the first switch module 21 being electrically connected to the plurality of control nodes 10 and to the memory module 22, respectively.
The first switching module 21 may be a Switch chip supporting the CXL protocol, which may enable high-speed interconnection of the plurality of control nodes 10 with the memory module 22.
The first switch module 21 may be connected to the control node 10 and the memory module 22, respectively, via PCIe (PERIPHERAL COMPONENT INTERCONNECT EXPRESS ) links. In one example, each control node 10 requires an 8 lane PCIe link for read and write operations to optimize read and write performance of the control node 10, and thus each control node 10 may be electrically connected to the first switch module 21 through a 16 lane PCIe link.
Memory module 22 may include a plurality of memory banks supporting the CXL protocol. In one example, the memory bank may take the form of a half-duplex design that is electrically connected to the first switch module 21 through an 8-lane PCIe link. The link between the memory bank and the first switch module 21 only supports the read operation or the write operation to be performed asynchronously, so that PCIe channel resources of the first switch module 21 can be saved, and a greater number of memory banks can be configured in the interconnect device 20.
The memory module 22 may provide a cache space that may be logically divided into corresponding cache partitions for each control node 10, and the cache data associated with each control node 10 may be stored in the cache partition associated with that control node 10.
The plurality of control nodes 10 may implement cache sharing through at least one interconnect device 20.
In performing the writing of the buffered data, the control node 10 may provide the first buffered data to the first switching module 21, and the first switching module 21 may be configured to write the first buffered data provided by the control node 10 to the memory module 22, and specifically, may write the first buffered data to a buffer partition included in the memory module 22 and related to the control node 10.
When the cache data is read, the first switch module 21 may receive a data read request from the control node 10, and in response to the request, may read second cache data from a corresponding cache partition in the memory module 22, and send the second cache data to the corresponding control node 10, so as to provide the second cache data stored in the memory module 22 to the control node 10.
Or the control node 10 can directly access the memory module 22 through the first exchange module 21 in a memory direct access mode so as to realize the reading and writing of the cache data. For example, the control node 10 may write the first cache data in the memory module 22 via the first switch module 21 to implement writing of the first cache data, or the control node 10 may read the second cache data from the memory module 22 via the first switch module 21 to implement reading of the second cache data, so that the plurality of control nodes 10 may implement cache sharing through the at least one interconnection device 20.
For at least one interconnect device 20, the first switch modules 21 of the at least one interconnect device 20 may be connected to each other to implement interconnection between the at least one first switch module 21, so that a connection path between the at least one memory module 22 may be established to ensure consistency of the buffered data recorded in the at least one memory module 22.
In the storage system, at least one interconnection device 20 may perform backup of the cached data by means of hot backup. For example, taking the number of interconnected devices 20 as n as an example, for any one control node 10, the control node 10 may be connected to the first switch modules 21 of the n interconnected devices 20, respectively. When the control node 10 generates the cache data, the cache data may be copied into n shares, and the control node 10 may send the n shares of cache data to the first switch modules 21 of the n interconnection devices 20, so that the first switch modules 21 may write one share of cache data into the memory modules 22 of the interconnection devices 20, so as to implement hot backup of the cache data generated by the control node 10 on the n interconnection devices 20.
Or in the storage system, at least one interconnection device 20 may have a master interconnection device and one or more slave interconnection devices, where the master interconnection device may store cache data by using a hot backup manner, and synchronize the cache data to the one or more slave interconnection devices by using a cold backup manner. Taking the example that there are 1 master interconnect device and m slave interconnect devices in at least one interconnect device 20, the plurality of control nodes 10 may be connected to at least the first switch modules 21 of the master interconnect device, and the first switch modules 21 of the master interconnect device may be connected to the first switch modules 21 of the m slave interconnect devices, respectively. When the control node 10 generates the cache data, the cache data may be at least sent to the first switching module 21 of the main interconnection device, and the cache data is written into the memory module of the main interconnection device through the first switching module 21, so as to realize hot backup of the cache data generated by the control node 10 on the main interconnection device. When the master interconnection device meets the cold backup condition, the master interconnection device can synchronize the cache data recorded in the memory module 22 of the master interconnection device to the memory modules 22 of the m slave interconnection devices through the connection links between the master interconnection device and the first exchange modules 21 of the m slave interconnection devices, so as to realize cold backup of the cache data on the m slave interconnection devices. The cold backup condition may include, but is not limited to, that the storage system stops working, that the primary interconnect device does not read and write cached data within a certain period of time, and the like.
According to an embodiment of the present invention, the plurality of control nodes may be electrically connected to the first switching module in the at least one interconnection device, respectively, so that each control node may share the memory module of the at least one interconnection device. By means of the three-layer architecture of the control node, the interconnection device and the memory module, centralized connection of the control node and the memory and cascade connection of the multiple interconnection devices can be achieved through the first exchange module of the interconnection device, cache data are integrated in the same cache pool for sharing design, the problems of complexity, delay and expansibility of cache sharing of the multiple control nodes can be at least partially solved, cache sharing efficiency in a multi-control shared storage system is effectively improved, and read-write delay of the cache data is reduced. On the other hand, through the connecting link between the first exchange modules of at least one interconnection device, the at least one interconnection device can realize data sharing in a plurality of backup modes, so that the loss rate of the cached data can be reduced, and the reliability of the storage system is improved.
The storage system shown in FIG. 2 is further described with reference to the accompanying drawings in connection with a specific embodiment.
Optionally, the first processors of at least one interconnection device may be disposed in the interconnection device, so that when the first processors of a single interconnection device are abnormal, through an interconnection link between the at least one first processor, the first processors of other interconnection devices may implement backup of cache data through the interconnection link, so as to ensure data security.
FIG. 3 shows a schematic diagram of a memory system according to another embodiment of the invention.
As shown in fig. 3, in the storage system, the interconnection device 20 further comprises a first processor 23, and the first processor 23 of at least one interconnection device 20 is connected to each other.
The first processor 23 may be a CPU (Central Processing Unit ) of the interconnect device 20.
The first processors 23 of the at least one interconnection apparatus 20 may be interconnected by an NTB, network or other type of link, not limited herein.
In the storage system, the connection channel between the first switch modules 21 of at least one interconnect device 20 may be used for cache synchronization and data mirroring under normal operation, i.e. hot and cold backups of cached data as described above. The connection path between the first processors 23 of at least one interconnect device 20 may be used for emergency data backup in case of a failure of the interconnect device 20. That is, in the case that one interconnect device fails, the other interconnect devices are configured to read, via the first processor of the failed interconnect device, the third cache data from the memory module of the failed interconnect device, and write the third cache data to the memory module of the other interconnect device.
For example, 3 interconnection devices, i.e., interconnection device JBOM, interconnection device JBOM, and interconnection device JBOM3, may be disposed in the storage system, and the first processor CPU1 of the interconnection device JBOM, the first processor CPU2 of the interconnection device JBOM, and the first processor CPU3 of the interconnection device JBOM3 may be connected to each other through NTB. Under an abnormal condition, interconnect JBOM1 fails, then the first processor CPU2 of interconnect JBOM2 may read cache data from the memory module of interconnect JBOM1 and write the cache data to the memory module of interconnect JBOM1 via the NTB path between itself and the first processor CPU1 of interconnect JBOM1, via the first processor CPU1 of interconnect JBOM1, and similarly, the first processor CPU3 of interconnect JBOM3 may read cache data from the memory module of interconnect JBOM and write the cache data to the memory module of interconnect JBOM3 via the NTB path between itself and the first processor CPU1 of interconnect JBOM1 via the first processor CPU1 of interconnect JBOM. Thus, interconnect JBOM and interconnect JBOM can implement an emergency backup of data upon failure of interconnect JBOM 1.
Optionally, the first processor of one interconnection device may detect the working state of the first processor of the other interconnection device in real time to determine whether the other interconnection device has a fault, which is not limited herein.
According to the embodiment of the invention, the quick backup of the cache data among the at least one interconnection device can be realized through the interconnection channels among the first exchange modules and the interconnection channels among the first processors in the at least one interconnection device, the data loss rate can be effectively reduced, the fault processing capacity of the storage system is improved, and the safety of the cache data is improved.
Alternatively, in the cached data emergency backup scenario described above, the failure of the interconnect device may be represented as a first processor failure of the interconnect device, such as a first processor power failure, a first processor downtime, etc. Or the fault of the interconnection device may be represented as abnormal power supply of the interconnection device, and at this time, the interconnection devices may also share the power supply.
Fig. 4 shows a schematic diagram of a memory system according to another embodiment of the invention.
As shown in fig. 4, in the storage system, the interconnect device 20 may also include a main power module 24, which main power module 24 may be used to power the various components of the interconnect device 20. For example, the main power module 24 may form a power channel with the first switch module 21, the memory module 22, and the first processor 23, respectively, through which the main power module 24 may supply power to the first switch module 21, the memory module 22, and the first processor 23, respectively.
The main power module 24 may be a PSU (Power Supply Unit ) of the interconnect device 20.
Alternatively, the main power module 24 may implement various levels and types of power supply, for example, the main power module 24 may provide 220V ac, 12V dc, 5V dc, 3.3V dc, which is not limited herein.
Alternatively, the interconnection devices 20 may have a power interface formed therein, and the power interfaces of at least one interconnection device 20 may be connected to each other. In the case where the main power module 24 is abnormally operated, for example, in the case where the main power module 24 cannot normally supply at least one of 220V ac, 12V dc, 5V dc, 3.3V dc, or the power supply voltage of the main power module 24 is low, the first switching module 21, the memory module 22, and the first processor 23 may be in an abnormally operated state due to insufficient power supply, so that the interconnect device 20 malfunctions. In the event of a failure of the interconnect device 20, the main power module 24 of the other interconnect device may serve as a backup power source for the failed interconnect device to power the failed interconnect device via the interconnect link between the power interfaces of at least one interconnect device 20.
For example, 3 interconnection devices may be disposed in the storage system, which are interconnection device JBOM, interconnection device JBOM, and interconnection device JBOM3, respectively, where the main power module PSU 1of interconnection device JBOM, the main power module PSU 2of interconnection device JBOM, and the main power module PSU3 of interconnection device JBOM may be interconnected through the power interfaces of interconnection device JBOM, interconnection device JBOM, and interconnection device JBOM 3. Under an abnormal condition, the main power module PSU 1of the interconnection device JBOM cannot provide 220V of ac power, so that the first switch module, the memory module, the first processor, etc. in the interconnection device JBOM are powered down. The first processor of the interconnect JBOM may detect the power failure of the interconnect JBOM1, and control the main power module PSU 2of the interconnect JBOM as the standby power source of the interconnect JBOM1 to supply power to the first switch module, the memory module, the first processor, etc. in the interconnect JBOM1 through the power supply path between the main power module PSU 2of the interconnect JBOM and the main power module PSU 1of the interconnect JBOM 1. Similarly, the first processor of the interconnect JBOM may detect the power failure of the interconnect JBOM1 and control the main power module PSU3 of the interconnect JBOM to serve as the standby power source of the interconnect JBOM1 to supply power to the first switch module, the memory module, the first processor, etc. in the interconnect JBOM1 through the power supply path between the main power module PSU3 of the interconnect JBOM3 and the main power module PSU 1of the interconnect JBOM 1.
In a period when the interconnection device JBOM and the interconnection device JBOM3 perform power backup for the interconnection device JBOM1, the interconnection device JBOM and the interconnection device JBOM can write the cache data recorded in the interconnection device JBOM1 into the memory modules of the interconnection device JBOM and the interconnection device JBOM respectively, so that backup of the cache data is realized, and therefore the security of the cache data can be effectively improved.
In some embodiments, a standby power module may be disposed in the interconnection device, where the standby power module may be configured to perform standby power for the interconnection device, that is, in a case where a main power module of the interconnection device is abnormal due to power failure or the like, the standby power module may supply power to a first switching module, a memory module, a first processor, and the like of the interconnection device.
Fig. 5 shows a schematic diagram of a memory system according to another embodiment of the invention.
As shown in fig. 5, in the storage system, the interconnect device 20 may further include a backup power module 25, and the backup power module 25 may be used to power the various components of the interconnect device 20. For example, the standby power module 25 may form a power channel with the first switching module 21, the memory module 22, and the first processor 23, respectively, through which the main power module 24 may supply power to the first switching module 21, the memory module 22, and the first processor 23, respectively.
The backup power module 25 may be a BBU of the interconnect device 20.
Similar to the main power module 24, the standby power module 25 may also implement various levels and types of power supply, for example, the standby power module 25 may provide 220V ac, 12V dc, 5V dc, 3.3V dc, which is not limited herein. The backup power module 25 may hold less power than the primary power module 24.
Alternatively, the power channels of the standby power module 25 and the main power module 24 for supplying power to the components of the interconnection apparatus 20 may be independent from each other, or the standby power module 25 and the main power module 24 may multiplex the power channels for supplying power to the components of the interconnection apparatus 20, that is, the standby power module 25 and the main power module 24 may supply power to the components of the interconnection apparatus 20 through the same power channels, which is not limited herein.
Alternatively, an interlock circuit may be formed between the standby power module 25 and the main power module 24, where the standby power module 25 does not perform power output when the main power module 24 performs power output normally, and conversely, the standby power module 25 may perform power output when the main power module 24 cannot perform power output normally.
Optionally, in the case that the interconnection device further includes a standby power module, when the main power module of one interconnection device fails, the standby power implementation of the interconnection device may not be limited. For example, the standby power module of the interconnection device may be used to perform the standby power of the interconnection device preferentially, for example, the main power module of the other interconnection device may be used to perform the standby power of the interconnection device preferentially, for example, the standby power module of the interconnection device and the main power module of the other interconnection device may be used to perform the standby power of the interconnection device simultaneously, and so on.
According to the embodiment of the invention, the main power supply module and the standby power supply module are arranged in the interconnection equipment, and when the main power supply module of one interconnection equipment is abnormal, the standby power supply module of the interconnection equipment and/or the main power supply module of other interconnection equipment is used for carrying out standby power for the failed interconnection equipment, so that the multiple standby power setting of the interconnection equipment can be realized, the working stability of the interconnection equipment can be effectively improved, the loss probability of cache data which is not stored permanently in the interconnection equipment is reduced, and the safety of the cache data is ensured.
Optionally, in order to further improve the security of the cached data, when the interconnection device triggers the standby event, the interconnection device may further perform a landing process on the cached data, so as to implement persistence processing on the cached data in the memory module.
Fig. 6 shows a schematic diagram of a data-landing scheme according to an embodiment of the present invention.
As shown in fig. 6, a storage module 26 may be further disposed in the interconnection device, where the storage module 26 may be electrically connected to the first processor 23, and the storage module 26 may be used to store the buffered data.
The storage module 26 may be any type of nonvolatile memory, such as a solid state disk, a mechanical hard disk, etc., and is not limited herein. Both the main power module and the backup power module may be used to power the memory module 26.
Specifically, when the main power module in the interconnection device is abnormal, the interconnection device triggers a standby power event, and power of the first processor 23, the storage module 26 and other components of the interconnection device is provided by other interconnection devices or by the standby power module 25 in the interconnection device. At this time, the first processor 23 may read the buffered data in the memory module 22 via the first switch module 21, and write the buffered data into the storage module 26, so as to implement the landing of the buffered data.
During standby power, the interconnection device can store all cache data, so that the stop of the interconnection device can not cause the loss of the cache data. After the main power module of the interconnection device returns to normal, the first processor 23 may read the buffered data from the storage module 26, and write the buffered data into the memory module 22 according to the original path, so that the plurality of control nodes may read and use the corresponding buffered data from the memory module 22.
Optionally, the storage module configured in the interconnection device may also be used for power backup of the control node. For example, when any control node is powered down, the first processor may drop the buffered data associated with the control node to the storage module to avoid losing the buffered data associated with the control node. At this time, since the cache data of the control node is recorded in the memory module in advance, the control node itself can remove the standby power module on the control node without performing standby power operation, thereby saving the hardware cost of the storage system.
FIG. 7 shows a schematic diagram of a memory system according to another embodiment of the invention.
As shown in fig. 7, only one interconnect device may be provided in the storage system, and the interconnect device may further include a controller 27, and the controller 27 may be electrically connected to the first processor 23 and to the memory module 22.
The controller 27 may be a CPLD (Complex Programmable Logic Device ). Both the main power module and the backup power module may be used to power the controller 27.
The controller 27 may be electrically connected to the plurality of control nodes 10, respectively, and may be configured to receive identification information from the control nodes 10, which may be used to indicate whether the corresponding control node 10 is powered down.
The controller 27, upon receiving the identification information from the control node 10, may determine whether the identification information indicates that the control node 10 is powered down, and in case it is determined that the identification information indicates that the control node is powered down, send a backup signal to the first processor 23. The backup signal may be used to instruct the control node 10.
The first processor 23 may be configured to read fourth cache data associated with the control node 10 from the memory module 22 and write the fourth cache data to the storage module 26 based on the backup signal.
Specifically, the first processor 23 may determine, after receiving the backup signal, the control node 10 indicated by the backup signal, determine a cache partition corresponding to the indicated control node 10 from the memory module 22, read all cache data in the cache partition to obtain fourth cache data related to the indicated control node 10, and then write the fourth cache data into the storage module 26 to implement a disk-dropping process for the cache data of the indicated control node 10.
According to the embodiment of the invention, by arranging the standby power supply module and the storage module on the interconnection equipment, the data disc dropping process can be realized on the interconnection equipment without participation of a control node. The method comprises the steps that after alternating current power supply of a control node is powered off, the control node can provide identification information representing the alternating current power supply to a controller of the interconnection device, the controller receives and processes the identification information to obtain a backup signal and sends the backup signal to a first processor, the first processor can send a preparation signal after a certain period of time after receiving the backup signal to inform the interconnection device that the storage of the cache data is needed at the moment, and at the moment, the first processor can write the corresponding cache data into a storage module. Therefore, the whole data landing process only needs the control node to provide the identification information to the interconnection equipment, and the control node is not needed to participate, so that the decoupling of the data landing process and the control node is realized, and the control node is not needed to keep power supply in the data landing process, so that the standby power supply modules on all the control nodes can be removed, and the hardware cost of the control node is reduced.
Optionally, in the interconnection device, the PCIe protocol used by the first switch module may support PCIe hot plug functionality. When the PCIe link fails, such as bandwidth reduction of the PCIe link, transmission speed reduction of the PCIe link, and disconnection of the PCIe link, hot plug repair can be performed through a first processor arranged in the interconnection device.
FIG. 8A shows a schematic diagram of a memory system according to another embodiment of the invention.
As shown in fig. 8A, in the storage system, the memory module 22 may include a plurality of memory units 221, the plurality of memory units 221 may be electrically connected to the controller 27, and the first processor 23 may be electrically connected to the controller 27.
Each memory unit 221 may be electrically connected to the first switch module 21 through an 8-channel PCIe link, and the number of memory units 221 may be increased according to the number of PCIe channels of the first switch module 21, which is not limited herein.
The memory unit 221 can support a hot plug function in terms of structure and hardware design, and when a certain memory unit 221 fails, the memory unit 221 can be replaced online.
In order to increase the reliability of the memory system, in the event of a failure of the memory unit 221, on-line repair may be performed through cooperation of the controller 27 with the first processor 23. Specifically, the first bit signal, the power control signal, and the reset signal of the corresponding PCIe link of each memory unit 221 may be connected to the controller 27, and the controller 27 forwards the processed signals to the first processor 23, and the first processor 23 performs on-line repair operations as needed.
In this on-line repair operation, a reset operation for the memory cell 221 may be preferentially performed, and the reset operation only needs to restart a part of circuits in the memory cell 221, so that the reset operation takes less time.
Memory unit 221 may be configured to provide a first bit signal to controller 27. The first processor 23 may be configured to perform a poll check on each of the memory units 221, and in the event that the check determines that one of the memory units 221 is faulty, the first processor 23 may send a fault message to the controller 27.
After receiving the fault information, the controller 27 may first determine whether the faulty memory cell is suitable for online repair, that is, the controller 27 may determine whether the faulty memory cell is in place based on the received first in-place signal of the faulty memory cell. If it is determined that the faulty memory unit is located away from the socket or the connection interface, it may be determined that the current memory unit is located away from the socket or the connection interface, and is thus not suitable for the online repair process, at this time, the controller 27 may transfer the determination result of the memory unit located away to the first processor 23, so that the first processor 23 sends, based on the determination result, warning information to an operation and maintenance personnel of the storage system, so that the operation and maintenance personnel perform manual repair on the memory unit. If it is determined that the failed memory cell is in place, the repair of the failure may be performed by preferentially using the online repair mode, at which time the controller 27 may send a repair signal to the first processor 23, where the repair signal may be used to indicate the failed memory cell.
The first processor 23 may determine the indicated failed memory cell based on the repair signal and send a reset instruction to the failed memory cell to control the failed memory cell to perform a reset operation.
Alternatively, the controller 27 may be connected to the reset signal interface of each memory unit 221, and the first processor 23 may send a reset instruction to the failed memory unit via the corresponding reset signal interface in the controller 27, so that the failed memory unit performs a reset operation.
Alternatively, after the first processor 23 completes sending the reset instruction, the status of the plurality of memory units 221 may be continuously polled, and based on the polling result, the first processor 23 may perform a restart operation for the failed memory unit in a state in which it is still determined that the failed memory unit has not completed the failover.
Alternatively, the controller 27 may be connected to the power control interface of each memory unit 221, and the first processor 23 may send a restart instruction to the failed memory unit via the corresponding power control interface in the controller 27, so that the failed memory unit performs a restart operation.
Alternatively, the restart operation may be performed by powering up and powering down the failed memory cell. Specifically, the first processor 23 may control the main power module to disconnect and reconnect with the power supply link of the failed memory unit to perform the power up and power down operation for the failed memory unit.
Optionally, if the restarting operation still cannot implement the repair of the failed memory unit, the first processor 23 may send an alert to an operation and maintenance personnel of the storage system, so that the operation and maintenance personnel can perform manual repair on the memory unit.
FIG. 8B shows a schematic diagram of a memory system according to another embodiment of the invention.
As shown in fig. 8B, in the storage system, the interconnection device may further include a plurality of first connectors 28, which may be electrically connected with the first switching module 21 and the control node 10, respectively, to form a connection link between the control node 10 and the first switching module 21.
The first connector 28 may be a CDFP (Centum Dual-Fiber Push-On, hundred megadual Fiber Push-pull) connector.
The control node 10 may be provided therein with a signal compensation card 11, a second processor 12 and a node power supply module 13.
The signal compensation card 11 may be a retimer (signal shaper) card of the control node 10. The signal compensation card 11 can recover, clean and reconstruct the high-speed digital signal to overcome the distortion problem caused by distance, loss, noise and jitter in the transmission process of the signal. The signal transmission device can compensate channel loss, eliminate signal jitter, improve signal integrity and reduce error rate, thereby prolonging the effective transmission distance of high-speed signals.
The control node 10 may be electrically connected with the first connector 28 through the signal compensation card 11, and thus, the connection link between the control node 10 and the first switching module 21 may include a sub-link between the signal compensation card 11 and the first connector 28, and a sub-link between the first connector 28 and the first switching module 21.
The second processor 12 may be a CPU of the control node 10. The second processor 12 may be connected to the signal compensation card 11, and the buffered data generated by the second processor 12 may be sequentially sent to the first switch module 21 via the signal compensation card 11 and the first connector 28.
The node power supply module 13 may be a PSU of the control node 10. A node power module 13 may be used to power the various components on the control node 10, such as the signal compensation card 11 and the second processor 12. Similar to the main power module on the interconnect device, the node power module 13 may also provide 220V ac power, 12V dc power, etc. to the signal compensation card 11, the second processor 12, etc., which is not limited herein. Alternatively, the control node 10 may be considered powered down when the node power module 13 is unable to provide 220V ac power to the various components on the control node 10.
Alternatively, when a connection link of one control node 10 with the first switching module 21 is abnormal, the first processor 23 and the second processor 12 may synchronously perform an online repair operation for the connection link.
For the first processor 23, the first processor 23 may access the second presence signal of each first connector 28, or the controller 27 may access the second presence signal of each first connector 28, and the controller 27 may feed back the presence discrimination result of each first connector 28 to the first processor 23.
The first processor 23 may perform polling detection on a plurality of connection links, and in the case where an abnormality of one connection link is detected based on the polling result, the first processor 23 may determine a target connector associated with the abnormal connection link and acquire a second in-place signal of the target connector or an in-place discrimination result of the target connector. In the case that the target connector is determined to be out of position based on the second in-place signal or the in-place discrimination result of the target connector, the first processor 23 may send warning information to an operation and maintenance person of the storage system, so that the operation and maintenance person can perform manual repair on the memory unit. In the case that the presence of the target connector is determined based on the second presence signal or the presence determination result of the target connector, the first processor 23 may sequentially issue a disable instruction and an enable instruction to the target connector to perform in-band repair on the target connector.
For the second processor 12, the second processor 12 may access the second bit signal of each first connector 28, similar to the first processor 23. Also, the second processor 12 may perform polling detection on a plurality of connection links, and in the case where an abnormality of one connection link is detected based on the polling result, the second processor 12 may determine a target connector associated with the abnormal connection link and acquire a second in-place signal of the target connector. In the case where it is determined that the target connector is in place based on the second in-place signal of the target connector, the second processor 12 may determine a target signal compensation card associated with the abnormal connection link and control the node power module 13 to disconnect and reconnect with the power supply link of the target signal compensation card to perform a power-on and power-off operation for the target signal compensation card.
Alternatively, the repair operation performed by the first processor 23 and the repair operation processed by the second processor 12 may be sequentially performed in time sequence, for example, the in-band repair operation may be preferentially performed by the first processor 23, and after determining that the on-line repair of the connection link cannot be completed, the power-on/off operation for the signal compensation card may be performed by the second processor 12. Or the repair operation performed by the first processor 23 and the repair operation processed by the second processor 12 may be performed synchronously in time sequence, which is not limited herein.
According to the embodiment of the invention, through the structure and hardware design, the PCIe link of the first switching module can support online repair, and when the online repair condition is met, the first processor, the second processor, the controller and other components can cooperatively perform online repair of the connection link between the control node and the first switching module and the connection link between the memory unit and the first switching module, so that the operation and maintenance cost can be saved, and the operation and maintenance efficiency can be improved.
Optionally, the interconnect device may also support decoupling of the data mirroring channels and the cluster management channels.
FIG. 9 shows a schematic diagram of a memory system according to another embodiment of the invention.
As shown in fig. 9, in the storage system, a second switching module 29 may also be provided in the interconnect device. The second switching module 29 may be electrically connected to the plurality of control nodes 10 and to the first processor 23, respectively.
The second switching module 29 may be a Switch chip supporting the CXL protocol, which may enable high-speed interconnection of the plurality of control nodes 10 with the first processor 23.
Alternatively, the second switching module 29 may be a hub of a cluster management channel in the storage system, i.e. the signals transmitted in the second switching module 29 may be network signals.
Optionally, the control node 10 may have an optical network card 14. The optical network card 14 may be electrically connected to the second processor 12, and the second processor 12 may transmit the network signal in the form of an optical signal via the optical network card 14.
Accordingly, the interconnect device may be provided with an interface or connector for receiving a network signal in the form of an optical signal. For example, the interconnection device may further include a plurality of second connectors 210, where the second connectors 210 may be electrically connected to the second switch module 29 and to the control node 10, and in particular may be electrically connected to the optical port network card 14 on the control node 10 through an optical fiber, so that the plurality of control nodes 10 implement network interconnection through at least one interconnection device.
The second connector 210 may be an SFP (Small Form-factor Pluggable ) optical module, which may implement photoelectric conversion. At its transmitting end, i.e. the connection end of the second connector 210 with the optical network card 14, the second connector 210 may provide an electrical signal to a laser driver, and the laser driver drives the laser diode to emit light, so as to convert the electrical signal into an optical signal. At its receiving end, i.e. the connection end of the second connector 210 with the second switching module 29, the second connector can convert the optical signal into an electrical signal by means of a photo detector through the photo effect, and through shaping and amplifying processing, the electrical signal of the original input is obtained and provided to the second switching module 29.
The second processor 12 may be configured to generate a management signal and provide the management signal to the second switching module 29 via the optical network card 14 and the second connector 210, so that the first processor 23 performs out-of-band management on the plurality of control nodes 10 based on the management signal.
In some embodiments, the first processor 23 cannot directly process the management signal provided by the second switching module 29. At this time, the interconnection device may further be provided with a switching module 211. The switching module 211 may be electrically connected to the second switching module 29 and the first processor 23, respectively, and may perform format conversion on the received management signal to convert it into a management signal suitable for processing by the first processor 23.
The second switching module 29 may also be used to provide management signals from the plurality of control nodes 10 to the first processor 23 via the transit module 211. The first processor 23 may be configured to manage the plurality of control nodes 10 out-of-band based on the plurality of management signals.
According to the embodiment of the invention, the second switching module is arranged in the interconnection equipment, so that the interconnection equipment can be used for realizing network interconnection and out-of-band management of a plurality of control nodes, the network communication requirement among clusters is met, an additional switch is not required to be arranged in a storage system, and the space of a cabinet can be saved while the cost is saved, so that the expansion of the storage system is facilitated.
Optionally, the interconnection device may use a BMC (Baseboard Management Controller ) to implement chassis management, where the BMC may implement functions such as fan control, temperature acquisition, and anomaly alarm monitoring. In addition, the interconnection equipment can also use the controller to realize the fan control in the starting-up stage, the BBU battery standby stage and the BMC abnormal state so as to improve the stability of the whole operation stage of the interconnection equipment.
FIG. 10 shows a schematic diagram of a memory system according to another embodiment of the invention.
As shown in fig. 10, in the storage system, the interconnect device may further include a fan module 212 and a baseboard management controller 213. The baseboard management controller 213 may be electrically connected to the fan module 212 and is used to control rotation of the fan module 212.
For example, temperature sensors may be provided at each heat generating component of the interconnect device, and accordingly, each fan of the fan module 212 may also be directed toward each heat generating component. The baseboard management controller 213 may obtain temperature data fed back by each temperature sensor, and adjust the rotation speed of each fan according to a set fan modulation strategy based on the temperature data, so as to realize rotation control of the fan module.
Optionally, the controller 27 may also be electrically connected to the fan module 212. The controller 27 may be used to control the rotation of the fan module 212 in the event that the interconnect device is in a start-up phase, or the baseboard management controller 213 fails.
Specifically, during the startup phase of the interconnect device, the controller 27 is generally responsible for the startup timing of the various components in the interconnect device, and thus the controller 27 is generally activated in preference to the baseboard management controller 213. Thus, the controller 27 may control the individual fans of the fan module 212 to rotate based on a preset fan modulation strategy while the interconnect device is in the start-up phase. Similarly, in order to prevent the heat generating components in the interconnection device from burning out due to an excessively high temperature when the baseboard management controller 213 fails, the controller 27 is required to take over the rotation control of the fan module 212.
Optionally, when the interconnection device is in the BBU standby phase, the power consumption of the interconnection device is reduced on the premise of determining that the interconnection device is safely operated, because the power consumption of the BBU is relatively low and the power consumption of the baseboard management controller 213 is relatively high, the baseboard management controller 213 may be turned off first at this time, and the controller 27 is used for performing rotation control of the fan module 212 at this stage.
FIG. 11 illustrates a schematic diagram of a storage system cluster, in accordance with an embodiment of the invention.
As shown in fig. 11, a cluster of storage systems may include two storage systems.
Optionally, the interconnection device of the storage system includes a first expansion interface Scale-out1, where the first expansion interface Scale-out1 is electrically connected to a first switching module included in the interconnection device of the storage system, and is electrically connected to the first expansion interface Scale-out1 of the other storage system, so as to implement cascading of the two storage systems.
Optionally, the interconnection device of the storage system includes a second expansion interface Scale-out2, and the second expansion interface Scale-out2 is electrically connected to a second switching module included in the interconnection device of the storage system, and is electrically connected to the second expansion interface Scale-out2 of the other storage system, so as to implement cascading of the two storage systems.
Alternatively, in another embodiment, a cluster of storage systems may include more than two storage systems, which may be interconnected based on a ring topology to form the cluster of storage systems.
Optionally, the interconnection device of the storage system includes two first expansion interfaces Scale-out1, where the first expansion interfaces Scale-out1 are electrically connected to a first switching module included in the interconnection device of the storage system, where one first expansion interface Scale-out1 may be electrically connected to the first expansion interfaces Scale-out1 of one other storage system, and another first expansion interface Scale-out1 may be electrically connected to the first expansion interfaces Scale-out1 of another other storage system, so as to implement cascading of more than two storage systems.
Optionally, the interconnection device of the storage system includes two second expansion interfaces Scale-out2, where the second expansion interfaces Scale-out2 are electrically connected with a second switching module included in the interconnection device of the storage system, where one second expansion interface Scale-out2 may be electrically connected with the second expansion interfaces Scale-out2 of one other storage system, and another second expansion interface Scale-out2 may be electrically connected with the second expansion interfaces Scale-out2 of another other storage system, so as to implement cascading of more than two storage systems.
According to the embodiment of the invention, the cascade connection of a plurality of storage systems can be realized through the first expansion interface and the second expansion interface reserved on the interconnection equipment, so that more control nodes are interconnected, the flexibility and the expansibility of the storage systems are effectively improved, and the cost of the multi-control shared storage architecture is reduced.
Those skilled in the art will appreciate that the features recited in the various embodiments of the invention can be combined and/or combined in a variety of ways, even if such combinations or combinations are not explicitly recited in the present invention. In particular, the features recited in the various embodiments of the invention can be combined and/or combined in various ways without departing from the spirit and teachings of the invention. All such combinations and/or combinations fall within the scope of the invention.
The embodiments of the present invention are described above. These examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the invention, and such alternatives and modifications are intended to fall within the scope of the invention.

Claims (18)

1.一种存储系统,其特征在于,所述存储系统包括:1. A storage system, characterized in that the storage system comprises: 多个控制节点;以及Multiple control nodes; and 至少一个互联设备,所述互联设备包括第一交换模块和内存模块,所述第一交换模块分别与多个所述控制节点电连接,并与所述内存模块电连接,至少一个所述互联设备的第一交换模块相互连接;At least one interconnect device, the interconnect device including a first switching module and a memory module, the first switching module being electrically connected to a plurality of the control nodes and electrically connected to the memory module, and the first switching modules of at least one of the interconnect devices being interconnected with each other; 其中,所述第一交换模块用于将所述控制节点提供的第一缓存数据写入所述内存模块,或者,用于将所述内存模块中存储的第二缓存数据提供至所述控制节点,以便多个所述控制节点通过至少一个所述互联设备实现缓存共享;The first switching module is used to write the first cached data provided by the control node into the memory module, or to provide the second cached data stored in the memory module to the control node, so that multiple control nodes can share the cache through at least one interconnection device; 其中,所述互联设备还包括:The interconnected device further includes: 第一处理器;First processor; 控制器,与所述第一处理器电连接,并与所述内存模块电连接;以及The controller is electrically connected to the first processor and to the memory module; and 存储模块,与所述第一处理器电连接,所述存储模块用于存储缓存数据;A storage module, electrically connected to the first processor, is used to store cached data; 所述控制器用于接收来自所述控制节点的标识信息,并在确定所述标识信息表示所述控制节点断电的情况下,向所述第一处理器发送备份信号;The controller is used to receive identification information from the control node, and when it is determined that the identification information indicates that the control node is powered off, it sends a backup signal to the first processor; 所述第一处理器用于基于所述备份信号,从所述内存模块中读取与所述控制节点相关的第四缓存数据,并将所述第四缓存数据写入所述存储模块。The first processor is used to read fourth cache data related to the control node from the memory module based on the backup signal, and write the fourth cache data into the storage module. 2.根据权利要求1所述的存储系统,其特征在于,至少一个所述互联设备的第一处理器相互连接。2. The storage system according to claim 1, wherein at least one of the first processors of the interconnect devices are interconnected. 3.根据权利要求2所述的存储系统,其特征在于,在一个互联设备故障的情况下,其他互联设备用于经由故障的互联设备的第一处理器,从故障的互联设备的内存模块中读取第三缓存数据,并将所述第三缓存数据写入其他互联设备的内存模块。3. The storage system according to claim 2, characterized in that, in the event of a failure of one interconnect device, other interconnect devices are used to read third cache data from the memory module of the failed interconnect device via the first processor of the failed interconnect device, and write the third cache data into the memory module of the other interconnect device. 4.根据权利要求2所述的存储系统,其特征在于,所述内存模块包括多个内存单元,所述内存单元用于向所述控制器提供第一在位信号;4. The storage system according to claim 2, wherein the memory module comprises a plurality of memory units, the memory units being used to provide a first presence signal to the controller; 其中,所述第一处理器用于在检测到一个内存单元故障的情况下,向所述控制器发送故障信息;The first processor is configured to send fault information to the controller when a memory cell fault is detected. 所述控制器用于响应于接收所述故障信息,在确定故障的内存单元的第一在位信号表示为在位的情况下,向所述第一处理器发送修复信号;The controller is configured to send a repair signal to the first processor in response to receiving the fault information, provided that the first presence signal of the faulty memory cell indicates that it is present. 所述第一处理器还用于基于所述修复信号,向所述故障的内存单元发送复位指令,以控制所述故障的内存单元执行复位操作。The first processor is further configured to send a reset instruction to the faulty memory cell based on the repair signal, so as to control the faulty memory cell to perform a reset operation. 5.根据权利要求4所述的存储系统,其特征在于,所述互联设备还包括主电源模块,所述主电源模块用于向所述第一交换模块、所述内存模块、所述第一处理器、所述控制器和所述存储模块供电;5. The storage system according to claim 4, wherein the interconnection device further comprises a main power module, the main power module being used to supply power to the first switching module, the memory module, the first processor, the controller and the storage module; 其中,所述第一处理器还用于在确定所述故障的内存单元未完成故障修复的状态下,控制所述主电源模块与所述故障的内存单元的供电链路断开并重连,以执行针对所述故障的内存单元的上下电操作。The first processor is further configured to, when it is determined that the faulty memory cell has not completed fault repair, control the main power module to disconnect and reconnect the power supply link between the main power module and the faulty memory cell, so as to perform power-on and power-off operations for the faulty memory cell. 6.根据权利要求5所述的存储系统,其特征在于,所述互联设备还包括:6. The storage system according to claim 5, wherein the interconnect device further comprises: 备用电源模块,用于在所述主电源模块工作异常的情况下,向所述第一交换模块、所述内存模块、所述第一处理器、所述控制器和所述存储模块供电。A backup power module is used to supply power to the first switching module, the memory module, the first processor, the controller, and the storage module in the event of a malfunction of the main power module. 7.根据权利要求5所述的存储系统,其特征在于,在一个互联设备故障的情况下,其他互联设备的主电源模块还用于向故障的互联设备供电。7. The storage system according to claim 5, wherein, in the event of a failure of one interconnect device, the main power modules of other interconnect devices are also used to supply power to the failed interconnect device. 8.根据权利要求2所述的存储系统,其特征在于,所述互联设备还包括多个第一连接器,所述第一连接器分别与所述第一交换模块和所述控制节点电连接,以形成所述控制节点与所述第一交换模块之间的连接链路。8. The storage system according to claim 2, wherein the interconnect device further comprises a plurality of first connectors, the first connectors being electrically connected to the first switching module and the control node respectively to form a connection link between the control node and the first switching module. 9.根据权利要求8所述的存储系统,其特征在于,9. The storage system according to claim 8, characterized in that, 所述第一连接器用于向所述第一处理器提供第二在位信号;The first connector is used to provide a second presence signal to the first processor; 所述第一处理器用于在检测到一条连接链路异常的情况下,确定与异常的连接链路关联的目标连接器,并在确定所述目标连接器提供的第二在位信号表示为在位的情况下,依次向所述目标连接器下发禁用指令及启动指令,以对所述目标连接器进行带内修复。The first processor is configured to, upon detecting an abnormal connection link, determine the target connector associated with the abnormal connection link, and, upon determining that the second presence signal provided by the target connector indicates presence, sequentially issue a disable command and a start command to the target connector to perform in-band repair on the target connector. 10.根据权利要求9所述的存储系统,其特征在于,所述控制节点包括:10. The storage system according to claim 9, wherein the control node comprises: 信号补偿卡,与所述第一连接器电连接;The signal compensation card is electrically connected to the first connector. 第二处理器,与所述信号补偿卡电连接;以及The second processor is electrically connected to the signal compensation card; and 节点供电模块,用于分别为所述信号补偿卡和所述第二处理器供电;A node power supply module is used to supply power to the signal compensation card and the second processor respectively; 其中,所述第二处理器用于在检测到一条连接链路异常的情况下,确定与异常的连接链路关联的目标信号补偿卡,控制所述节点供电模块与所述目标信号补偿卡的供电链路断开并重连,以执行针对所述目标信号补偿卡的上下电操作。The second processor is used to determine the target signal compensation card associated with the abnormal connection link when an abnormal connection link is detected, and to control the node power supply module to disconnect and reconnect the power supply link of the target signal compensation card in order to perform power-on and power-off operations for the target signal compensation card. 11.根据权利要求2所述的存储系统,其特征在于,所述互联设备还包括:11. The storage system according to claim 2, wherein the interconnect device further comprises: 第二交换模块;以及The second switching module; and 多个第二连接器,所述第二连接器与所述第二交换模块电连接,并与所述控制节点电连接,以便多个所述控制节点通过至少一个所述互联设备实现网络互联。Multiple second connectors are electrically connected to the second switching module and to the control node, so that the multiple control nodes can be interconnected via at least one of the interconnecting devices. 12.根据权利要求11所述的存储系统,其特征在于,所述互联设备还包括:12. The storage system according to claim 11, wherein the interconnect device further comprises: 转接模块,分别与所述第二交换模块、所述第一处理器电连接;The adapter module is electrically connected to the second switching module and the first processor, respectively. 其中,所述第二交换模块还用于经所述转接模块,将来自多个所述控制节点的管理信号提供至所述第一处理器;The second switching module is further configured to provide management signals from the multiple control nodes to the first processor via the switching module; 所述第一处理器用于基于多个所述管理信号,对多个所述控制节点进行带外管理。The first processor is used to perform out-of-band management of the multiple control nodes based on the multiple management signals. 13.根据权利要求12所述的存储系统,其特征在于,所述控制节点还包括:13. The storage system according to claim 12, wherein the control node further comprises: 光口网卡,与所述第二连接器电连接;The optical network interface card is electrically connected to the second connector. 第二处理器,与所述光口网卡电连接;The second processor is electrically connected to the optical network card. 其中,所述第二处理器用于生成所述管理信号,并经由所述光口网卡和所述第二连接器,向所述第二交换模块提供所述管理信号,以便所述第一处理器基于所述管理信号对所述控制节点进行带外管理。The second processor generates the management signal and provides it to the second switching module via the optical network card and the second connector, so that the first processor can perform out-of-band management of the control node based on the management signal. 14.根据权利要求1所述的存储系统,其特征在于,所述互联设备还包括:14. The storage system according to claim 1, wherein the interconnect device further comprises: 风扇模块;以及Fan module; and 基板管理控制器,与所述风扇模块电连接,所述基板管理控制器用于对所述风扇模块进行转动控制。A baseboard management controller is electrically connected to the fan module, and the baseboard management controller is used to control the rotation of the fan module. 15.根据权利要求14所述的存储系统,其特征在于,所述互联设备还包括:15. The storage system according to claim 14, wherein the interconnect device further comprises: 控制器,与所述风扇模块电连接,所述控制器用于在所述互联设备处于启动阶段,或所述基板管理控制器故障的情况下,对所述风扇模块进行转动控制。A controller, electrically connected to the fan module, is used to control the rotation of the fan module when the interconnected device is in the startup phase or when the baseboard management controller fails. 16.一种存储系统集群,其特征在于,所述存储系统集群包括至少两个如权利要求1~15中任一项所述的存储系统。16. A storage system cluster, characterized in that the storage system cluster comprises at least two storage systems as described in any one of claims 1 to 15. 17.根据权利要求16所述的存储系统集群,其特征在于,所述存储系统的互联设备包括至少一个第一扩展接口,所述第一扩展接口与所述存储系统的互联设备包括的第一交换模块电连接,并与其他存储系统的第一扩展接口电连接,以实现至少两个所述存储系统的级联。17. The storage system cluster according to claim 16, wherein the interconnect device of the storage system includes at least one first expansion interface, the first expansion interface being electrically connected to a first switching module included in the interconnect device of the storage system and electrically connected to the first expansion interface of other storage systems to realize the cascading of at least two storage systems. 18.根据权利要求16所述的存储系统集群,其特征在于,所述存储系统的互联设备包括至少一个第二扩展接口,所述第二扩展接口与所述存储系统的互联设备包括的第二交换模块电连接,并与其他存储系统的第二扩展接口电连接,以实现至少两个所述存储系统的级联。18. The storage system cluster according to claim 16, wherein the interconnect device of the storage system includes at least one second expansion interface, the second expansion interface being electrically connected to a second switching module included in the interconnect device of the storage system, and electrically connected to the second expansion interfaces of other storage systems to realize the cascading of at least two of the storage systems.
CN202511299744.2A 2025-09-11 2025-09-11 Storage systems and storage system clusters Active CN120803374B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511299744.2A CN120803374B (en) 2025-09-11 2025-09-11 Storage systems and storage system clusters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511299744.2A CN120803374B (en) 2025-09-11 2025-09-11 Storage systems and storage system clusters

Publications (2)

Publication Number Publication Date
CN120803374A CN120803374A (en) 2025-10-17
CN120803374B true CN120803374B (en) 2025-11-28

Family

ID=97325220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511299744.2A Active CN120803374B (en) 2025-09-11 2025-09-11 Storage systems and storage system clusters

Country Status (1)

Country Link
CN (1) CN120803374B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102782661A (en) * 2012-05-18 2012-11-14 华为技术有限公司 Data storage system and method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2001275321A1 (en) * 2000-06-06 2001-12-17 Shyamkant R. Bhavsar Fabric cache
CN116662081B (en) * 2023-08-01 2024-02-27 苏州浪潮智能科技有限公司 Distributed storage redundancy method, device, electronic equipment and storage medium
CN117632808B (en) * 2024-01-24 2024-04-26 苏州元脑智能科技有限公司 Multi-control storage array, storage system, data processing method and storage medium
CN119045750B (en) * 2024-11-01 2025-04-29 苏州元脑智能科技有限公司 Storage system, cluster node, system, storage resource scheduling method and device
CN120560594B (en) * 2025-07-31 2025-10-21 浪潮电子信息产业股份有限公司 Storage cluster, system, data processing method, equipment, medium and product

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102782661A (en) * 2012-05-18 2012-11-14 华为技术有限公司 Data storage system and method

Also Published As

Publication number Publication date
CN120803374A (en) 2025-10-17

Similar Documents

Publication Publication Date Title
CN104410510B (en) Pass through the method, apparatus and system of interface card transmission information
CN101645915B (en) Disk array host channel daughter card, on-line switching system and switching method thereof
WO2013075511A1 (en) Rack server system
CN102662803A (en) Double-controlled double-active redundancy equipment
CN109407990A (en) A kind of solid state hard disk
CN120406680B (en) Computing system, control method, device, equipment, medium and product thereof
CN103428114A (en) ATCA (advanced telecom computing architecture) 10-gigabit switching board and system
CN117041184B (en) IO expansion device and IO switch
CN101488105B (en) Method for implementing high availability of memory double-controller and memory double-controller system
CN114594672A (en) Control system and control method thereof, and computer-readable storage medium
CN111628944B (en) Switch and switch system
CN104486256B (en) The multi plane switching fabric equipment of oriented integration network IA frame serverPC
CN120803374B (en) Storage systems and storage system clusters
CN111522698B (en) Automatic switching system and method for front-end processor
CN105511990B (en) Device based on fusion framework dual-redundancy storage control node framework
CN117666746B (en) Multi-node server, method, device and medium applied to multi-node server
CN113867648B (en) Server storage subsystem and control method thereof
CN107070699A (en) Controller monitoring is managed in storage system redundancy design method and device
JP2002136000A (en) Uninterruptible power supply system
JP4465824B2 (en) Controller system
WO2025139143A1 (en) Data storage system and method, and server
WO2025200605A1 (en) Management board, interface module, industrial control server and industrial control system
CN116248619B (en) A method and system for dynamic management of multiple nodes based on PCIe switches
JP2012230446A (en) Programmable controller station
CN106095720A (en) A kind of multichannel computer system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant