CN112596960B - Distributed storage service switching method and device - Google Patents
Distributed storage service switching method and device Download PDFInfo
- Publication number
- CN112596960B CN112596960B CN202011344216.1A CN202011344216A CN112596960B CN 112596960 B CN112596960 B CN 112596960B CN 202011344216 A CN202011344216 A CN 202011344216A CN 112596960 B CN112596960 B CN 112596960B
- Authority
- CN
- China
- Prior art keywords
- target
- network card
- storage server
- smart network
- distributed storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2089—Redundant storage control functionality
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Hardware Redundancy (AREA)
Abstract
本申请涉及数据存储技术领域,特别涉及一种分布式存储服务切换方法及装置。该方法包括:接收客户端发送的第一数据读写请求,并确定处理所述第一数据读写请求的目标存储服务器;将所述第一数据读写请求发送给目标存储服务器对应的目标智能网卡,以使得所述目标智能网卡基于本地运行的分布式存储服务对所述第一数据读写请求进行数据处理;若检测到所述目标智能网卡故障,则启动所述目标存储服务器上部署的被设置为待启动状态的分布式存储服务,以使得所述目标存储服务器基于本地运行的分布式存储服务对所述客户端发送的需所述目标存储服务器进行处理的第二数据读写请求进行数据处理。
The present application relates to the technical field of data storage, and in particular to a distributed storage service switching method and device. The method includes: receiving a first data read and write request sent by a client, and determining a target storage server for processing the first data read and write request; sending the first data read and write request to a target intelligence corresponding to the target storage server A network card, so that the target smart network card performs data processing on the first data read and write request based on a distributed storage service running locally; if a failure of the target smart network card is detected, start the The distributed storage service that is set to be started, so that the target storage server performs the second data read and write request sent by the client and needs to be processed by the target storage server based on the locally running distributed storage service data processing.
Description
技术领域technical field
本申请涉及数据存储技术领域,特别涉及一种分布式存储服务切换方法及装置。The present application relates to the technical field of data storage, and in particular to a distributed storage service switching method and device.
背景技术Background technique
分布式存储,是将数据分散存储在多台独立的设备上。传统的网络存储系统采用集中的存储服务器存放所有数据,存储服务器成为系统性能的瓶颈,也是可靠性和安全性的焦点,不能满足大规模存储应用的需要。分布式网络存储系统采用可扩展的系统结构,利用多台存储服务器分担存储负荷,利用位置服务器定位存储信息,它不但提高了系统的可靠性、可用性和存取效率,还易于扩展。Distributed storage is to disperse and store data on multiple independent devices. Traditional network storage systems use centralized storage servers to store all data. Storage servers become the bottleneck of system performance and the focus of reliability and security, which cannot meet the needs of large-scale storage applications. The distributed network storage system adopts a scalable system structure, uses multiple storage servers to share the storage load, and uses the location server to locate and store information. It not only improves the reliability, availability and access efficiency of the system, but also is easy to expand.
传统的ServerSan(软件定义存储)分布式存储架构为:服务器上配置普通网卡,分布式存储软件运行在主机的x86 CPU上。节点间通信采用TCP协议,经内核网络协议栈。各节点上分布式存储管理物理主机本地(PCIe)的NVMe设备。集群联动对外提供存储服务,并保证存储数据的一致性。The traditional ServerSan (software-defined storage) distributed storage architecture is: a common network card is configured on the server, and the distributed storage software runs on the x86 CPU of the host. Communication between nodes adopts TCP protocol, through the kernel network protocol stack. Distributed storage on each node manages NVMe devices local to the physical host (PCIe). Cluster linkage provides external storage services and ensures the consistency of stored data.
然而,传统的分布式存储架构中,运行在物理机上的分布式存储软件会占用物理机的cpu,内存等系统资源;与虚拟机共用物理机cpu资源,分布式存储软件占用的cpu数量会影响可建立的虚拟机数量。运行在物理机的分布式存储性能会受物理机OS和虚拟机运行的影响。运行在物理机的分布式存储软件管理其本地存储资源,也需要消耗一部分系统资源,即通过系统调用访问抽象的块层。However, in the traditional distributed storage architecture, the distributed storage software running on the physical machine will occupy the physical machine’s cpu, memory and other system resources; sharing the physical machine’s cpu resources with the virtual machine, the number of cpus occupied by the distributed storage software will affect the The number of virtual machines that can be created. The performance of distributed storage running on a physical machine will be affected by the running of the physical machine OS and virtual machine. The distributed storage software running on the physical machine manages its local storage resources, and also needs to consume some system resources, that is, access the abstract block layer through system calls.
为了解决上述问题,可以通过在存储服务器上配置nvme盘,智能网卡,及配置Nvmeof target。智能网卡上有独立于物理服务器的cpu,如Arm cpu,在Arm cpu上运行分布式存储软件,分布式存储软件所管理数据盘是通过Nvmeof连接到本主机的Nvmeof target形成网络数据盘。多个Arm存储结点组成分布式存储。In order to solve the above problems, you can configure nvme disks, smart network cards, and Nvmeof targets on the storage server. There is a CPU on the smart network card that is independent of the physical server, such as an Arm CPU. Distributed storage software runs on the Arm CPU. The data disk managed by the distributed storage software is connected to the Nvmeof target of the host through Nvmeof to form a network data disk. Multiple Arm storage nodes form distributed storage.
但是,基于智能网卡的分布式存储集群的可靠性受限于智能网卡的Arm节点的数量,如最小分布式存储集群节点数为3。当3个智能网卡组成集群环境里的一个智能网卡故障时,集群里只有2个节点时,存在脑裂风险;当3个智能网卡组成集群环境里的超过两个智能网卡故障时,集群不能提供服务。However, the reliability of the distributed storage cluster based on the smart network card is limited by the number of Arm nodes of the smart network card, for example, the minimum number of distributed storage cluster nodes is 3. When three iNICs form a cluster environment and one iNIC fails, and there are only two nodes in the cluster, there is a risk of split-brain; when three iNICs form a cluster environment and more than two iNICs fail, the cluster cannot provide Serve.
对一定数量节点的分布式存储,当集群少量节点down,剩余存储集群节点数量仍能满足要求,常规处理方法是触发数据平衡。当只有智能网卡故障,而磁盘是正常的,此时因为智能网卡故障,导致此节点不能正常读写磁盘,集群虽然还能提供存储服务,但新的存储IO会分散到其他节点的磁盘上,最终引起整个集群的数据不均衡。For distributed storage with a certain number of nodes, when a small number of nodes in the cluster are down, the number of remaining storage cluster nodes can still meet the requirements. The normal processing method is to trigger data balance. When only the smart network card fails, but the disk is normal, the node cannot read and write the disk normally because of the smart network card failure. Although the cluster can still provide storage services, the new storage IO will be distributed to the disks of other nodes. Eventually, the data of the entire cluster will be unbalanced.
发明内容Contents of the invention
本申请提供了一种分布式存储服务切换方法及装置,用以解决现有技术中存在的由于智能网卡故障而导致分布式存储服务不可用的问题。The present application provides a distributed storage service switching method and device, which are used to solve the problem in the prior art that the distributed storage service is unavailable due to a smart network card failure.
第一方面,本申请提供了一种分布式存储服务切换方法,应用于分布式存储系统,所述分布式存储系统中各存储服务器分别配置有对应的智能网卡,各智能网卡上运行分布式存储服务,所述各智能网卡分别与其对应的存储服务器上用于管理本地存储资源的控制器建立RDMA通道,所述各存储服务器上部署有被设置为待启动状态的分布式存储服务,所述方法包括:In the first aspect, the present application provides a method for switching distributed storage services, which is applied to a distributed storage system. Each storage server in the distributed storage system is equipped with a corresponding smart network card, and each smart network card runs a distributed storage service. service, each of the smart network cards establishes an RDMA channel with a controller for managing local storage resources on its corresponding storage server, and each of the storage servers is deployed with a distributed storage service that is set to be started, and the method include:
接收客户端发送的第一数据读写请求,并确定处理所述第一数据读写请求的目标存储服务器;receiving the first data read and write request sent by the client, and determining a target storage server for processing the first data read and write request;
将所述第一数据读写请求发送给目标存储服务器对应的目标智能网卡,以使得所述目标智能网卡基于本地运行的分布式存储服务对所述第一数据读写请求进行数据处理;Sending the first data read and write request to the target smart network card corresponding to the target storage server, so that the target smart network card performs data processing on the first data read and write request based on a locally running distributed storage service;
若检测到所述目标智能网卡故障,则启动所述目标存储服务器上部署的被设置为待启动状态的分布式存储服务,以使得所述目标存储服务器基于本地运行的分布式存储服务对所述客户端发送的需所述目标存储服务器进行处理的第二数据读写请求进行数据处理。If a failure of the target smart network card is detected, the distributed storage service deployed on the target storage server and set to be started is started, so that the target storage server Data processing is performed on the second data read and write request sent by the client and to be processed by the target storage server.
可选地,所述目标智能网卡基于本地运行的分布式存储服务对所述第一数据读写请求进行数据处理的步骤包括:Optionally, the step of the target iNIC performing data processing on the first data read and write request based on a locally running distributed storage service includes:
所述目标智能网卡将所述第一数据读写请求发送至所述目标存储服务器上用于管理本地存储资源的控制器,其中,所述控制器通过其对应的RDMA通道对所述第一数据读写请求进行数据处理。The target iNIC sends the first data read and write request to a controller on the target storage server for managing local storage resources, wherein the controller executes the first data through its corresponding RDMA channel Read and write requests for data processing.
可选地,所述目标智能网卡在正常运行时,基于预设的周期向所述目标存储服务器的内存中的第一指定位置写入心跳计数信息;Optionally, the target iNIC writes heartbeat count information to a first designated location in the memory of the target storage server based on a preset period during normal operation;
检测到所述目标智能网卡故障的步骤包括:The step of detecting the failure of the target intelligent network card includes:
当检测到所述目标存储服务器的内存中的第一指定位置维护的心跳计数在预设时长内未增加时,确定检测到所述目标智能网卡故障。When it is detected that the heartbeat count maintained at the first specified location in the internal memory of the target storage server does not increase within a preset time period, it is determined that a failure of the target smart network card is detected.
可选地,所述目标存储服务器上部署的被设置为待启动状态的分布式存储服务启动之后,所述目标存储服务器基于预设的周期向内存中的第二指定位置写入心跳计数信息;Optionally, after the distributed storage service deployed on the target storage server is started, the target storage server writes heartbeat count information to a second specified location in the memory based on a preset cycle;
在检测到所述目标智能网卡恢复正常时,若所述目标存储服务器上部署的分布式存储服务的状态为待启动和/或所述目标存储服务器的内存中的第二指定位置维护的心跳计数在预设时长内未增加,则所述目标智能网卡启动分布式存储服务,并对所述客户端发送的需所述目标存储服务器进行处理的第三数据读写请求进行数据处理。When it is detected that the target iNIC is back to normal, if the state of the distributed storage service deployed on the target storage server is to be started and/or the heartbeat count maintained at a second specified location in the memory of the target storage server If there is no increase within the preset time period, the target smart network card starts the distributed storage service, and performs data processing on the third data read and write request sent by the client and needs to be processed by the target storage server.
可选地,所述方法还包括:Optionally, the method also includes:
在检测到所述目标智能网卡恢复正常时,若所述目标存储服务器上部署的分布式存储服务运行正常,则所述目标智能网卡向所述目标存储服务器发送切换指令,并启动计时器,以使得所述目标存储服务器将本地运行的分布式存储服务设置为待启动状态,并启动与所述目标智能网卡之间的RDMA通道,以及向所述目标智能网卡发送切换完成指令;若所述目标智能网卡在接收到所述切换完成指令/在所述计时器超时时未接收到所述切换完成指令,则所述目标智能网卡启动分布式存储服务,并对所述客户端发送的需所述目标存储服务器进行处理的第三数据读写请求进行数据处理。When it is detected that the target smart network card returns to normal, if the distributed storage service deployed on the target storage server is running normally, the target smart network card sends a switching instruction to the target storage server, and starts a timer to Make the target storage server set the locally running distributed storage service to the state to be started, and start the RDMA channel with the target smart network card, and send a switching completion instruction to the target smart network card; if the target When the intelligent network card receives the switching completion instruction/does not receive the switching completion instruction when the timer times out, the target intelligent network card starts the distributed storage service, and sends the required information sent by the client. The third data read and write request processed by the target storage server performs data processing.
可选地,所述方法还包括:Optionally, the method also includes:
在所述目标智能网卡上启动分布式存储服务时,将所述目标存储服务器存储资源中存储的元数据加载至所述目标存储服务器的内存中的第三指定位置,其中,在检测到所述目标智能网卡故障,启动所述目标存储服务器上部署的被设置为待启动状态的分布式存储服务时,所述目标存储服务器基于所述第三指定位置存储的元数据对所述第二数据读写请求进行数据处理。When the distributed storage service is started on the target iNIC, the metadata stored in the storage resource of the target storage server is loaded to a third designated location in the memory of the target storage server, wherein, when the The target smart network card is faulty, and when the distributed storage service deployed on the target storage server and set to be started is started, the target storage server reads the second data based on the metadata stored in the third specified location. Write requests for data processing.
第二方面,本申请提供了一种分布式存储服务切换装置,应用于分布式存储系统,所述分布式存储系统中各存储服务器分别配置有对应的智能网卡,各智能网卡上运行分布式存储服务,所述各智能网卡分别与其对应的存储服务器上用于管理本地存储资源的控制器建立RDMA通道,所述各存储服务器上部署有被设置为待启动状态的分布式存储服务,所述装置包括:In the second aspect, the present application provides a distributed storage service switching device, which is applied to a distributed storage system. Each storage server in the distributed storage system is respectively equipped with a corresponding smart network card, and each smart network card runs a distributed storage service. service, each of the smart network cards establishes an RDMA channel with a controller for managing local storage resources on its corresponding storage server, and each of the storage servers is deployed with a distributed storage service that is set to be started, and the device include:
接收单元,用于接收客户端发送的第一数据读写请求,并确定处理所述第一数据读写请求的目标存储服务器;A receiving unit, configured to receive a first data read/write request sent by the client, and determine a target storage server for processing the first data read/write request;
发送单元,用于将所述第一数据读写请求发送给目标存储服务器对应的目标智能网卡,以使得所述目标智能网卡基于本地运行的分布式存储服务对所述第一数据读写请求进行数据处理;A sending unit, configured to send the first data read and write request to a target smart network card corresponding to the target storage server, so that the target smart network card performs the first data read and write request based on a locally running distributed storage service data processing;
切换单元,用于在检测到所述目标智能网卡故障时,启动所述目标存储服务器上部署的被设置为待启动状态的分布式存储服务,以使得所述目标存储服务器基于本地运行的分布式存储服务对所述客户端发送的需所述目标存储服务器进行处理的第二数据读写请求进行数据处理。The switching unit is configured to start the distributed storage service deployed on the target storage server and set to be started when a failure of the target smart network card is detected, so that the target storage server is based on the locally running distributed storage service. The storage service performs data processing on the second data read and write request sent by the client and to be processed by the target storage server.
可选地,所述目标智能网卡基于本地运行的分布式存储服务对所述第一数据读写请求进行数据处理的步骤包括:Optionally, the step of the target iNIC performing data processing on the first data read and write request based on a locally running distributed storage service includes:
所述目标智能网卡将所述第一数据读写请求发送至所述目标存储服务器上用于管理本地存储资源的控制器,其中,所述控制器通过其对应的RDMA通道对所述第一数据读写请求进行数据处理。The target iNIC sends the first data read and write request to a controller on the target storage server for managing local storage resources, wherein the controller executes the first data through its corresponding RDMA channel Read and write requests for data processing.
可选地,所述目标智能网卡在正常运行时,基于预设的周期向所述目标存储服务器的内存中的第一指定位置写入心跳计数信息;Optionally, the target iNIC writes heartbeat count information to a first designated location in the memory of the target storage server based on a preset period during normal operation;
在检测到所述目标智能网卡故障时,所述切换单元具体用于:When a failure of the target intelligent network card is detected, the switching unit is specifically configured to:
当检测到所述目标存储服务器的内存中的第一指定位置维护的心跳计数在预设时长内未增加时,确定检测到所述目标智能网卡故障。When it is detected that the heartbeat count maintained at the first specified location in the internal memory of the target storage server does not increase within a preset time period, it is determined that a failure of the target smart network card is detected.
可选地,所述目标存储服务器上部署的被设置为待启动状态的分布式存储服务启动之后,所述目标存储服务器基于预设的周期向内存中的第二指定位置写入心跳计数信息;Optionally, after the distributed storage service deployed on the target storage server is started, the target storage server writes heartbeat count information to a second specified location in the memory based on a preset cycle;
所述切换单元还用于,在检测到所述目标智能网卡恢复正常时,若所述目标存储服务器上部署的分布式存储服务的状态为待启动和/或所述目标存储服务器的内存中的第二指定位置维护的心跳计数在预设时长内未增加,则所述目标智能网卡启动分布式存储服务,并对所述客户端发送的需所述目标存储服务器进行处理的第三数据读写请求进行数据处理。The switching unit is further configured to, when detecting that the target smart network card returns to normal, if the state of the distributed storage service deployed on the target storage server is to be started and/or the status of the distributed storage service in the memory of the target storage server is The heartbeat count maintained at the second designated location does not increase within the preset time length, then the target smart network card starts the distributed storage service, and reads and writes the third data sent by the client and needs to be processed by the target storage server Request for data processing.
可选地,所述切换单元还用于,在检测到所述目标智能网卡恢复正常时,若所述目标存储服务器上部署的分布式存储服务运行正常,则所述目标智能网卡向所述目标存储服务器发送切换指令,并启动计时器,以使得所述目标存储服务器将本地运行的分布式存储服务设置为待启动状态,并启动与所述目标智能网卡之间的RDMA通道,以及向所述目标智能网卡发送切换完成指令;若所述目标智能网卡在接收到所述切换完成指令/在所述计时器超时时未接收到所述切换完成指令,则所述目标智能网卡启动分布式存储服务,并对所述客户端发送的需所述目标存储服务器进行处理的第三数据读写请求进行数据处理。Optionally, the switching unit is further configured to: when detecting that the target iNIC is back to normal, if the distributed storage service deployed on the target storage server is running normally, the target iNIC sends The storage server sends a switching instruction, and starts a timer, so that the target storage server sets the locally running distributed storage service to the to-be-started state, starts the RDMA channel with the target smart network card, and sends the The target intelligent network card sends a switching completion instruction; if the target intelligent network card receives the switching completion instruction/does not receive the switching completion instruction when the timer expires, the target intelligent network card starts the distributed storage service , and perform data processing on the third data read and write request sent by the client and to be processed by the target storage server.
可选地,所述装置还包括:Optionally, the device also includes:
加载单元,在所述目标智能网卡上启动分布式存储服务时,所述加载单元将所述目标存储服务器存储资源中存储的元数据加载至所述目标存储服务器的内存中的第三指定位置,其中,在检测到所述目标智能网卡故障,启动所述目标存储服务器上部署的被设置为待启动状态的分布式存储服务时,所述目标存储服务器基于所述第三指定位置存储的元数据对所述第二数据读写请求进行数据处理。a loading unit, when the distributed storage service is started on the target smart network card, the loading unit loads the metadata stored in the storage resource of the target storage server to a third specified location in the memory of the target storage server, Wherein, when a failure of the target smart network card is detected and the distributed storage service deployed on the target storage server and set to be started is started, the target storage server stores the metadata based on the third specified location. Perform data processing on the second data read and write request.
第三方面,本申请实施例提供一种分布式存储服务切换装置,该装置包括:In a third aspect, the embodiment of the present application provides a distributed storage service switching device, which includes:
存储器,用于存储程序指令;memory for storing program instructions;
处理器,用于调用所述存储器中存储的程序指令,按照获得的程序指令执行如上述第一方面中任一项所述的方法的步骤。The processor is configured to call the program instructions stored in the memory, and execute the steps of the method according to any one of the above-mentioned first aspects according to the obtained program instructions.
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使所述计算机执行如上述第一方面中任一项所述方法的步骤。In a fourth aspect, the embodiment of the present application further provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make the computer perform the above-mentioned first A step of any one of the methods described in the aspect.
综上可知,本申请实施例提供的分布式存储服务切换方法,接收客户端发送的第一数据读写请求,并确定处理所述第一数据读写请求的目标存储服务器;将所述第一数据读写请求发送给目标存储服务器对应的目标智能网卡,以使得所述目标智能网卡基于本地运行的分布式存储服务对所述第一数据读写请求进行数据处理;若检测到所述目标智能网卡故障,则启动所述目标存储服务器上部署的被设置为待启动状态的分布式存储服务,以使得所述目标存储服务器基于本地运行的分布式存储服务对所述客户端发送的需所述目标存储服务器进行处理的第二数据读写请求进行数据处理。To sum up, it can be seen that the distributed storage service switching method provided by the embodiment of the present application receives the first data read and write request sent by the client, and determines the target storage server for processing the first data read and write request; The data read and write request is sent to the target smart network card corresponding to the target storage server, so that the target smart network card performs data processing on the first data read and write request based on the distributed storage service running locally; if the target smart network card is detected If the network card fails, start the distributed storage service deployed on the target storage server that is set to be started, so that the target storage server responds to the request sent by the client based on the distributed storage service running locally. The second data read and write request processed by the target storage server performs data processing.
采用本申请实施例提供的分布式存储服务切换方法,在智能网卡故障无法提供分布式存储服服务时,可以启动预先部署在存储服务器上的分布式存储服务,避免了由于智能网卡故障而导致分布式存储系统不可用的问题,提升了分布式存储系统的可靠性,增强用户体验度。Using the distributed storage service switching method provided by the embodiment of the present application, when the smart network card fails to provide distributed storage service services, the distributed storage service pre-deployed on the storage server can be started, avoiding the distributed storage service caused by the smart network card failure. The problem of unavailability of the distributed storage system improves the reliability of the distributed storage system and enhances the user experience.
附图说明Description of drawings
为了更加清楚地说明本申请实施例或者现有技术中的技术方案,下面将对本申请实施例或者现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据本申请实施例的这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the application or the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments of the application or the prior art. Obviously, the accompanying drawings in the following description These are only some embodiments described in this application, and those skilled in the art can also obtain other drawings according to these drawings of the embodiments of this application.
图1为本申请实施例提供的一种分布式存储系统的结构示意图;FIG. 1 is a schematic structural diagram of a distributed storage system provided by an embodiment of the present application;
图2为本申请实施例提供的一种分布式存储服务切换方法的详细流程图;FIG. 2 is a detailed flowchart of a distributed storage service switching method provided in an embodiment of the present application;
图3为本申请实施例提供的一种分布式存储服务切换装置的结构示意图;FIG. 3 is a schematic structural diagram of a distributed storage service switching device provided by an embodiment of the present application;
图4为本申请实施例提供的另一种分布式存储服务切换装置的结构示意图。FIG. 4 is a schematic structural diagram of another distributed storage service switching device provided by an embodiment of the present application.
具体实施方式Detailed ways
在本申请实施例使用的术语仅仅是出于描述特定实施例的目的,而非限制本申请。本申请和权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其它含义。还应当理解,本文中使用的术语“和/或”是指包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, rather than limiting the present application. As used in this application and the claims, the singular forms "a", "the" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
应当理解,尽管在本申请实施例可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,此外,所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although terms such as first, second, and third may be used in the embodiment of the present application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present application, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, furthermore, the use of the word "if" could be interpreted as "at" or "when" or "in response to a determination."
下面结合具体应用场景对本申请实施例提供的分布式存储系统的结构进行详细说明。示例性的,参阅图1所示,为本申请提供的一种分布式存储系统的结构示意图,其中,该分布式存储系统包括3个存储服务器(存储服务器1,存储服务器2和存储服务器3),需要说明的是,本申请实施例中,各存储服务器上部署有被设置为待启动状态的分布式存储服务。各存储服务器配置有对应的智能网卡(智能网卡1,智能网卡2和智能网卡3),各存储服务器均包括用于存储数据的存储资源和用于管理本地存储资源的控制器,各智能网卡上运行分布式存储服务,各存储服务器配置的智能网卡与其对应的存储服务器上用于管理本地存储资源的控制器之间建立远程直接数据存取(Remote Direct Memory Access,RDMA)通道。每个控制器管理的存储资源可以由多个磁盘组成。磁盘具体可以是符合非易失性内存主机控制器接口规范(Non-Volatile Memory express,NVMe)的磁盘,本说明书中可以称其为NVMe磁盘。The structure of the distributed storage system provided by the embodiment of the present application will be described in detail below in combination with specific application scenarios. For example, refer to FIG. 1, which is a schematic structural diagram of a distributed storage system provided by the present application, wherein the distributed storage system includes 3 storage servers (storage server 1, storage server 2 and storage server 3) , it should be noted that, in this embodiment of the application, each storage server is deployed with a distributed storage service that is set to be started. Each storage server is configured with a corresponding smart network card (smart network card 1, smart network card 2 and smart network card 3), and each storage server includes a storage resource for storing data and a controller for managing local storage resources. To run the distributed storage service, a Remote Direct Memory Access (RDMA) channel is established between the smart network card configured on each storage server and the controller for managing local storage resources on the corresponding storage server. Storage resources managed by each controller can consist of multiple disks. Specifically, the disk may be a disk conforming to a non-volatile memory host controller interface specification (Non-Volatile Memory express, NVMe), which may be referred to as an NVMe disk in this specification.
也就是说,针对一个存储服务器而言,在其对应的智能网卡正常时,可由智能网卡提供分布式存储服务,采用其对应的智能网卡对客户端发送的需自身处理的数据读写请求进行数据处理,在智能网卡故障时,即可采用存储服务器本地部署的分布式存储服务对客户端发送的需自身处理的数据读写请求进行数据处理。That is to say, for a storage server, when its corresponding smart network card is normal, the distributed storage service can be provided by the smart network card. Processing, when the intelligent network card fails, the distributed storage service deployed locally by the storage server can be used to process the data read and write requests sent by the client that need to be processed by itself.
需要注意的是,上述系统结构以及设备数量、磁盘数量等都仅用于示例性说明,并不用于限定。It should be noted that the above system structure, number of devices, number of disks, etc. are for illustrative purposes only, and are not intended to be limiting.
所谓智能网卡,是可以具有独立的中央处理器(Central Processing Unit,CPU)。智能网卡也可以叫作加速卡,具有一定的网络、存储加速能力,通常可以通过现场可编程门阵列(Field-Programmable Gate Array,FPGA)实现。由于智能网卡本身可以进行基于分布式存储的数据输入/输出的处理,因此可以将存储服务器的处理器从该业务中解放出来,只由智能网卡处理数据的输入/输出。The so-called smart network card may have an independent central processing unit (Central Processing Unit, CPU). The smart network card can also be called an accelerator card, which has certain network and storage acceleration capabilities, and can usually be realized by a field-programmable gate array (Field-Programmable Gate Array, FPGA). Since the smart network card itself can process data input/output based on distributed storage, the processor of the storage server can be freed from this service, and only the smart network card can process data input/output.
示例性的,参阅图2所示,为本申请实施例提供的一种分布式存储服务切换方法的详细流程图,该方法应用于分布式存储系统,该分布式存储系统中各存储服务器分别配置有对应的智能网卡,各智能网卡上运行分布式存储服务,该各智能网卡分别与其对应的存储服务器上用于管理本地存储资源的控制器建立RDMA通道,该各存储服务器上部署有被设置为待启动状态的分布式存储服务,该方法包括以下步骤:Exemplarily, refer to FIG. 2 , which is a detailed flowchart of a distributed storage service switching method provided by an embodiment of the present application. The method is applied to a distributed storage system, and each storage server in the distributed storage system is configured separately There are corresponding smart network cards, and distributed storage services run on each smart network card. Each smart network card establishes an RDMA channel with the controller for managing local storage resources on the corresponding storage server. A distributed storage service in a state to be started, the method includes the following steps:
步骤200:接收客户端发送的第一数据读写请求,并确定处理上述第一数据读写请求的目标存储服务器。Step 200: Receive a first data read/write request sent by a client, and determine a target storage server for processing the first data read/write request.
具体地,分布式存储系统在接收到客户端发送的数据读写请求之后,作为管理节点的管理设备会将该数据读写请求分配给分布式存储系统中的一个存储服务器来进行处理,即管理设备会将该数据读写请求发送给用于处理该数据读写请求的存储服务器对应的智能网卡。需要说明的是,上述目标存储服务器为分布式存储系统中任一存储服务器。Specifically, after the distributed storage system receives the data read and write request sent by the client, the management device as the management node will assign the data read and write request to a storage server in the distributed storage system for processing, that is, the management The device will send the data read and write request to the smart network card corresponding to the storage server for processing the data read and write request. It should be noted that the above target storage server is any storage server in the distributed storage system.
进一步的,本申请实施例中,需预先在智能网卡运行分布式存储服务,具体地,智能网卡基于用户配置进行初始化操作,以使得上述智能网卡运行分布式存储服务,并使得上述智能网卡与其对应的存储服务器上用于管理本地存储资源的控制器建立RDMA数据通道。Further, in the embodiment of the present application, it is necessary to run the distributed storage service on the smart network card in advance. Specifically, the smart network card is initialized based on the user configuration, so that the above-mentioned smart network card runs the distributed storage service, and the above-mentioned smart network card corresponds to it The controller on the storage server used to manage local storage resources establishes an RDMA data channel.
具体的,本申请实施例中,智能网卡基于用户配置进行初始化操作,以使得上述智能网卡运行分布式存储服务时,一种较佳地实现方式为,智能网卡基于用户下发的配置指令,在本地运行分布式存储服务,并与其它运行有分布式存储服务的智能网卡组成分布式存储服务集群。Specifically, in the embodiment of the present application, the smart network card performs an initialization operation based on user configuration, so that when the above smart network card runs the distributed storage service, a preferred implementation method is that the smart network card is based on the configuration command issued by the user. Run the distributed storage service locally, and form a distributed storage service cluster with other smart network cards running the distributed storage service.
也就是说,本申请实施例中,将分布式存储服务运行在智能网卡上。That is to say, in this embodiment of the application, the distributed storage service is run on the smart network card.
进一步的,智能网卡基于用户配置进行初始化操作,以使得上述智能网卡与其对应的存储服务器上用于管理本地存储资源的控制器建立RDMA数据通道时,一种较佳的实现方式为,智能网卡基于用户下发的配置命令,在本地配置NVMe over Fabrics协议,作为initiator端与其对应的存储服务器上用于管理本地存储资源的控制器建立RDMA数据通道,其中,上述智能网卡对应的存储服务器上用于管理本地存储资源的控制器被配置为NVMe over Fabrics协议的target端。Further, the smart network card performs an initialization operation based on user configuration, so that when the smart network card establishes an RDMA data channel with the controller on the corresponding storage server for managing local storage resources, a better implementation method is that the smart network card is based on The configuration command issued by the user configures the NVMe over Fabrics protocol locally, and establishes an RDMA data channel between the initiator and the controller for managing local storage resources on the corresponding storage server. The controller that manages local storage resources is configured as the target side of the NVMe over Fabrics protocol.
具体地,智能网卡基于用户下发的配置命令,在本地配置NVMe over Fabrics协议,作为initiator端与其对应的存储服务器上用于管理本地存储资源的控制器建立远程直接数据存取RDMA数据通道时,一种较佳地实现方式为,智能网卡基于用户下发的配置命令,在本地配置RDMA的IP地址,作为initiator端,使用NVMe over Fabrics协议中的initiator工具连接其对应的存储服务器上用于管理本地存储资源的控制器;其中,上述智能网卡对应的存储服务器上用于管理本地存储资源的控制器基于用户下发的配置命令,在本地具有RDMA能力的网卡接口上配置RDMA的IP地址,并在具有NVMe-oF target offlad能力的网卡芯片上配置NVMe over Fabrics协议,作为target端。也就是说,各存储服务器上控制器包括的网卡芯片是集成有RDMA功能和NVMe-oF target offlad能力的网卡芯片。Specifically, based on the configuration command issued by the user, the smart network card configures the NVMe over Fabrics protocol locally, and when the initiator side establishes a remote direct data access RDMA data channel with the controller for managing local storage resources on the corresponding storage server, A better implementation method is that the smart network card configures the IP address of RDMA locally based on the configuration command issued by the user, and as the initiator end, uses the initiator tool in the NVMe over Fabrics protocol to connect to its corresponding storage server for management A controller for local storage resources; wherein, the controller for managing local storage resources on the storage server corresponding to the smart network card configures the IP address of RDMA on the local network card interface with RDMA capability based on the configuration command issued by the user, and Configure the NVMe over Fabrics protocol on the network card chip with NVMe-oF target offlad capability as the target end. That is to say, the network card chip included in the controller on each storage server is a network card chip integrated with the RDMA function and the NVMe-oF target offflad capability.
例如,初始化每个存储服务器分别对应的智能网卡以及每个控制器,为每个智能网卡和每个控制器配置NVMe over Fabrics协议,将智能网卡作为initiator端,将控制器作为target端进行连接,启动分布式存储服务。For example, initialize the smart network card and each controller corresponding to each storage server, configure the NVMe over Fabrics protocol for each smart network card and each controller, use the smart network card as the initiator end, and connect the controller as the target end, Start the distributed storage service.
可选地,上述初始化过程具体可以包括以下步骤:Optionally, the above initialization process may specifically include the following steps:
步骤1:每个智能网卡上可以配置出用于存放其对应的存储服务器中存储数据对应的元数据的区域,因此也可以称为元数据区域。在初始化过程中,将存储资源中存储的元数据加载至元数据区域。Step 1: Each iNIC can be configured with an area for storing metadata corresponding to data stored in its corresponding storage server, so it can also be called a metadata area. During the initialization process, the metadata stored in the storage resource is loaded into the metadata area.
另一种较佳地实现方式为,在存储服务器的内存中配置出用于存放该存储服务器上的存储资源中的存储数据对应的元数据的区域,因此也可以称为元数据区域。那么,在初始化过程中,将存储资源中的元数据加载至内存的元数据区域中。Another preferred implementation manner is that an area for storing the metadata corresponding to the storage data in the storage resources on the storage server is configured in the memory of the storage server, so it may also be called a metadata area. Then, during the initialization process, the metadata in the storage resource is loaded into the metadata area of the memory.
步骤2:每个控制器可以配置RDMA的网际互连协议地址(Internet Protocol,IP),并配置NVMe over Fabrics协议,作为target端。Step 2: Each controller can be configured with an Internet Protocol address (Internet Protocol, IP) for RDMA, and configured with an NVMe over Fabrics protocol as a target end.
步骤3:每个智能网卡可以配置RDMA的IP地址,并配置NVMe over Fabrics协议,作为initiator端,使用NVMe over Fabrics协议中的initiator工具连接其对应的控制器,启动分布式存储服务。Step 3: Each smart network card can be configured with the IP address of RDMA and the NVMe over Fabrics protocol. As the initiator, use the initiator tool in the NVMe over Fabrics protocol to connect to its corresponding controller and start the distributed storage service.
其中,启动分布式存储服务具体可以包括:每个智能网卡将分配的磁盘进行标记并记录在本地,将与分配的磁盘中的存储数据对应的元数据通过RDMA协议写入智能网卡配置的元数据区域中。Among them, starting the distributed storage service may specifically include: each smart network card marks and records the allocated disk locally, and writes the metadata corresponding to the storage data in the allocated disk into the metadata configured by the smart network card through the RDMA protocol in the area.
每个智能网卡可以通过RDMA协议以及与存储数据对应的元数据,对存储数据进行输入/输出的处理。Each smart network card can perform input/output processing on the stored data through the RDMA protocol and the metadata corresponding to the stored data.
本申请实施例中,在使能各存储服务器的RDMA数据通道时,一种较佳地实现方式为,为各存储服务器具有RDMA能力的网卡接口配置IP地址如1.1.1.2,1.1.2.2,......,可使用RoCEv2协议。In this embodiment of the application, when enabling the RDMA data channel of each storage server, a preferred implementation method is to configure IP addresses such as 1.1.1.2, 1.1.2.2, . ..., the RoCEv2 protocol can be used.
在使能各智能网卡的RDMA数据通道时,一种较佳地实现方式为,为各智能网卡上具有RDMA能力的网卡接口配置IP地址1.1.1.1,1.1.2.1,……,可使用RoCEv2协议。When enabling the RDMA data channel of each smart network card, a better implementation method is to configure IP addresses 1.1.1.1, 1.1.2.1, ... for the network card interface with RDMA capability on each smart network card, and the RoCEv2 protocol can be used .
也就是说,各存储服务器使能RDMA数据通道,即为存储服务器1上具有RDMA能力的网卡接口配置IP地址(如IP 11),各智能网卡使能RDMA数据通道,即为智能网卡1上具有RDMA能力的网卡接口配置IP地址(IP 21),其中,存储服务器1对应的智能网卡为智能网卡1,那么,智能网卡1即可作为initiator端,采用IP地址被配置为IP 11的网卡接口与存储服务器1上IP地址被被配置为IP 21的网卡接口之间建立RDMA数据通道。That is to say, each storage server enables the RDMA data channel, that is, configures an IP address (such as IP 11) for the network card interface with RDMA capability on the storage server 1, and enables each smart network card to enable the RDMA data channel, that is, the smart network card 1 has the The network card interface with RDMA capability is configured with an IP address (IP 21), wherein the smart network card corresponding to storage server 1 is smart network card 1, then smart network card 1 can be used as the initiator terminal, and the network card interface with the IP address configured as IP 11 and An RDMA data channel is established between network card interfaces whose IP address is configured as IP 21 on storage server 1 .
步骤210:将上述第一数据读写请求发送给目标存储服务器对应的目标智能网卡,以使得上述目标智能网卡基于本地运行的分布式存储服务对上述第一数据读写请求进行数据处理。Step 210: Send the above-mentioned first data read-write request to the target iNIC corresponding to the target storage server, so that the above-mentioned target iNIC performs data processing on the above-mentioned first data read-write request based on the locally running distributed storage service.
具体地,分布式存储系统中的管理设备会将该第一数据读写请求发送给用于处理该数据读写请求的目标存储服务器对应的目标智能网卡,目标智能网卡在接收到该第一数据读写请求时,基于本地运行的分布式存储服务对该第一数据读写请求进行数据处理。Specifically, the management device in the distributed storage system will send the first data read and write request to the target smart network card corresponding to the target storage server for processing the data read and write request, and the target smart network card will receive the first data When reading and writing a request, data processing is performed on the first data reading and writing request based on a distributed storage service running locally.
例如,目标智能网卡上运行的分布式存储服务启动时,若确定目标存储服务器上部署的分布式存储服务的运行状态为待启动状态,则可以通过RDMA写入的方式从磁盘加载元数据到目标存储服务器的内存中,在启动完成后,提供分布式存储服务。For example, when the distributed storage service running on the target smart network card is started, if it is determined that the running state of the distributed storage service deployed on the target storage server is the waiting state, the metadata can be loaded from the disk to the target through RDMA writing. In the memory of the storage server, after the startup is completed, a distributed storage service is provided.
本申请实施例中,目标智能网卡基于本地运行的分布式存储服务对上述第一数据读写请求进行数据处理时,一种较佳地实现方式为,上述目标智能网卡将上述第一数据读写请求发送至上述目标存储服务器上用于管理本地存储资源的控制器,其中,上述控制器通过其对应的RDMA通道对上述第一数据读写请求进行数据处理。In the embodiment of the present application, when the target smart network card processes the above-mentioned first data read and write request based on the distributed storage service running locally, a preferred implementation method is that the above-mentioned target smart network card reads and writes the above-mentioned first data The request is sent to a controller on the target storage server for managing local storage resources, wherein the controller performs data processing on the first data read and write request through its corresponding RDMA channel.
例如,目标智能网卡在接收到第一数据读写请求之后,将该第一数据读写请求发送至对应的目标存储服务器中的控制器,控制器在接收到该第一数据读写请求时,解析该第一数据读写请求,并通过与目标智能网卡间建立的RDMA数据通道发起DMA操作,并将数据读写结果返回给目标智能网卡。即目标智能网卡直接访问目标存储服务器上的存储资源,对存储资源进行读写操作。For example, after receiving the first data read-write request, the target smart network card sends the first data read-write request to the controller in the corresponding target storage server, and when the controller receives the first data read-write request, Parse the first data read and write request, initiate a DMA operation through the RDMA data channel established with the target smart network card, and return the data read and write result to the target smart network card. That is, the target iNIC directly accesses the storage resources on the target storage server, and performs read and write operations on the storage resources.
步骤220:若检测到上述目标智能网卡故障,则启动上述目标存储服务器上部署的被设置为待启动状态的分布式存储服务,以使得上述目标存储服务器基于本地运行的分布式存储服务对上述客户端发送的需上述目标存储服务器进行处理的第二数据读写请求进行数据处理。Step 220: If the above-mentioned target smart network card failure is detected, start the distributed storage service deployed on the above-mentioned target storage server and set to be started, so that the above-mentioned target storage server can provide the above-mentioned client with the distributed storage service based on the locally running Data processing is performed on the second data read and write request sent by the terminal and required to be processed by the target storage server.
本申请实施例中,在目标存储服务器的内存中预先设置有第一指定位置,目标智能网卡在正常运行时,会基于预设的周期向上述目标存储服务器的内存中的第一指定位置写入心跳计数信息;当然,在目标存储服务器的内存中还预先设置有第二指定位置,目标存储服务器上部署的被设置为待启动状态的分布式存储服务启动之后,上述目标存储服务器基于预设的周期向内存中的第二指定位置写入心跳计数信息。In the embodiment of the present application, a first designated location is preset in the memory of the target storage server, and when the target smart network card is running normally, it will write to the first designated location in the memory of the above-mentioned target storage server based on a preset period Heartbeat counting information; of course, a second specified location is also preset in the memory of the target storage server. After the distributed storage service deployed on the target storage server is started, the above target storage server will Periodically writes heartbeat count information to the second specified location in memory.
进一步地,还可以将目标存储服务器上部署的分布式存储服务器的运行状态(待启动/运行)写入上述第二指定位置。实际应用中,可以通过内存访问的方式可以访问上述第一指定位置和第二指定位置写入/读取数据。Further, the running state (to be started/running) of the distributed storage server deployed on the target storage server may also be written into the above-mentioned second specified location. In practical applications, the above-mentioned first specified location and the second specified location can be accessed to write/read data through memory access.
那么,在检测上述目标智能网卡故障时,一种较佳地实现方式为,当检测到上述目标存储服务器的内存中的第一指定位置维护的心跳计数在预设时长内未增加时,确定检测到上述目标智能网卡故障。Then, when detecting the failure of the above-mentioned target smart network card, a preferred implementation method is to determine that the detection to the above target iNIC failure.
也就是说,当目标智能网卡上的分布式存储服务正常运行时,定期将心跳计数写入目标存储服务器的内存中的第一指定位置,即第一指定位置的心跳计数会随着时间变化而增加,那么,当检测到预设时长内该心跳计数未增加,则确定目标智能网卡发生故障。That is to say, when the distributed storage service on the target smart network card is running normally, the heartbeat count is regularly written to the first specified location in the memory of the target storage server, that is, the heartbeat count at the first specified location will change over time. increase, then, when it is detected that the heartbeat count does not increase within the preset time period, it is determined that the target intelligent network card fails.
例如,当目标智能网卡故障(如,软件crash),目标智能网卡无法再向第一指定区域写入心跳计数。目标存储服务器会定期检查第一指定位置的心跳计数,若确定一定时间内第一指定位置的心跳计数不再增加,则判定目标智能网卡异常。需进入切换过程。For example, when the target intelligent network card fails (for example, software crashes), the target intelligent network card cannot write the heartbeat count to the first designated area any more. The target storage server will periodically check the heartbeat count of the first designated location, and if it is determined that the heartbeat count of the first designated location does not increase within a certain period of time, then it is determined that the target intelligent network card is abnormal. Need to enter the switching process.
具体地,切换过程如下:目标存储服务器关闭Nvme of服务;启动本地部署的被设置为待启动状态的分布式存储服务,此时,由于目标智能网卡已将最新的元数据写入目标存储服务器的内存中,目标存储服务器在启动分布式存储服务时,不再需要从磁盘加载元数据;若确定启动成功,则定期将心跳计数,写入第二指定区域(可通过内存访问写入),确定转换成功;进一步地,目标存储服务器还可以将本地部署的分布式存储服务的运行状态写入第二指定区域。若确定启动不成功,则不会定期将心跳计数写入第二指定区域,确定转换失败。Specifically, the switching process is as follows: the target storage server closes the Nvme of service; the distributed storage service that is set to be started locally is started, at this time, because the target smart network card has written the latest metadata into the In the memory, when the target storage server starts the distributed storage service, it no longer needs to load metadata from the disk; if it is determined that the startup is successful, it will regularly count the heartbeat and write it to the second designated area (which can be written through memory access), and determine The conversion is successful; further, the target storage server can also write the operating status of the locally deployed distributed storage service into the second specified area. If it is determined that the startup is unsuccessful, the heartbeat count will not be written into the second designated area periodically, and it is determined that the conversion fails.
需要说明的是,本申请实施例中,在上述目标智能网卡上启动分布式存储服务时,将上述目标存储服务器存储资源中存储的元数据加载至上述目标存储服务器的内存中的第三指定位置,其中,在检测到上述目标智能网卡故障,启动上述目标存储服务器上部署的被设置为待启动状态的分布式存储服务时,上述目标存储服务器基于上述第三指定位置存储的元数据对上述第二数据读写请求进行数据处理。It should be noted that, in the embodiment of the present application, when the distributed storage service is started on the above-mentioned target smart network card, the metadata stored in the storage resource of the above-mentioned target storage server is loaded to the third specified location in the internal memory of the above-mentioned target storage server , wherein, when a failure of the above-mentioned target smart network card is detected and the distributed storage service deployed on the above-mentioned target storage server and set to be started is started, the above-mentioned target storage server performs the above-mentioned metadata storage based on the metadata stored in the third specified location Two data read and write requests for data processing.
也就是说,启动智能网卡启动分布式存储服务时,将磁盘中的元数据加载至存储服务器内存的指定位置中,智能网卡在基于存储服务器内存的指定位置中存储的元数据对数据读写请求进行处理的同时,会更新元数据内容;当检测到智能网卡故障,将分布式存储服务切换至存储服务器时,存储服务器内存的指定位置中存储的元数据为最新的元数据,那么,存储服务器无需再执行元数据加载的操作,直接使用存储服务器内存的指定位置中存储的元数据对后续接收到的数据读写请求进行数据处理即可。That is to say, when the smart network card is started to start the distributed storage service, the metadata in the disk is loaded to the specified location of the storage server memory, and the smart network card reads and writes the data based on the metadata stored in the specified location of the storage server memory While processing, the metadata content will be updated; when a smart network card failure is detected and the distributed storage service is switched to the storage server, the metadata stored in the specified location of the storage server memory is the latest metadata, then the storage server It is no longer necessary to perform the metadata loading operation, and directly use the metadata stored in the specified location of the storage server memory to perform data processing on subsequent data read and write requests received.
进一步地,在检测到上述目标智能网卡恢复正常时,若上述目标存储服务器上部署的分布式存储服务的状态为待启动和/或上述目标存储服务器的内存中的第二指定位置维护的心跳计数在预设时长内未增加,则上述目标智能网卡启动分布式存储服务,并对上述客户端发送的需上述目标存储服务器进行处理的第三数据读写请求进行数据处理。Further, when it is detected that the above-mentioned target smart network card returns to normal, if the state of the distributed storage service deployed on the above-mentioned target storage server is to be started and/or the heartbeat count maintained at the second specified location in the memory of the above-mentioned target storage server If there is no increase within the preset time period, the above-mentioned target smart network card starts the distributed storage service, and performs data processing on the third data read and write request sent by the above-mentioned client and needs to be processed by the above-mentioned target storage server.
例如,修复目标智能网卡故障后,判断目标存储服务器上部署的分布式存储服务是否正常运行,具体地,从第二指定区域获取目标存储服务器上部署的分布式存储服务的运行状态,若为待启动状态,则目标智能网卡直接启动本地的分布式存储服务,若为运行(已启动)状态,则继续从第二指定区域获取目标存储服务器写入的心跳计数,根据心跳计数判断目标存储服务器上部署的分布式存储服务是否正常运行,若否,则目标智能网卡直接启动本地的分布式存储服务,采用本地的分布式存储服务对后续需要目标存储服务器进行处理的数据读写请求进行数据处理。For example, after the failure of the target intelligent network card is repaired, it is judged whether the distributed storage service deployed on the target storage server is running normally. Specifically, the running status of the distributed storage service deployed on the target In the startup state, the target smart network card directly starts the local distributed storage service. If it is in the running (started) state, it continues to obtain the heartbeat count written by the target storage server from the second designated area, and judges the target storage server based on the heartbeat count. Whether the deployed distributed storage service is running normally, if not, the target smart network card directly starts the local distributed storage service, and uses the local distributed storage service to process the subsequent data read and write requests that need to be processed by the target storage server.
更进一步地,在检测到上述目标智能网卡恢复正常时,若上述目标存储服务器上部署的分布式存储服务运行正常,则上述目标智能网卡向上述目标存储服务器发送切换指令,并启动计时器,以使得上述目标存储服务器将本地运行的分布式存储服务设置为待启动状态,并启动与上述目标智能网卡之间的RDMA通道,以及向上述目标智能网卡发送切换完成指令;若上述目标智能网卡在接收到上述切换完成指令/在上述计时器超时时未接收到上述切换完成指令,则上述目标智能网卡启动分布式存储服务,并对上述客户端发送的需上述目标存储服务器进行处理的第三数据读写请求进行数据处理。Furthermore, when it is detected that the above-mentioned target smart network card returns to normal, if the distributed storage service deployed on the above-mentioned target storage server is running normally, the above-mentioned target smart network card sends a switching instruction to the above-mentioned target storage server, and starts a timer to Make the above target storage server set the locally running distributed storage service to the state to be started, and start the RDMA channel with the above target smart network card, and send the switching completion instruction to the above target smart network card; if the above target smart network card is receiving When the above-mentioned switching completion instruction is reached/the above-mentioned switching completion instruction is not received when the above-mentioned timer expires, the above-mentioned target smart network card starts the distributed storage service, and reads the third data sent by the above-mentioned client and needs to be processed by the above-mentioned target storage server. Write requests for data processing.
例如,修复目标智能网卡故障后,判定目标存储服务器上部署的分布式存储服务运行正常,则向目标存储服务器发送acquire命令,启动acquire命令处理定时器,等待接收acquire finish命令;目标存储服务器在接收acquire命令时,关闭本地运行的分布式存储服务,开启NVMe-of target服务,并将本地部署的分布式存储服务的运行状态写入第二指定区域,以及向目标智能网卡发送acquire finish命令;若目标智能网卡在定时器超时前接收到acquire finish命令或在定时器超时后仍未接收到acquire finish命令,则目标智能网卡直接启动本地的分布式存储服务,采用本地的分布式存储服务对后续需要目标存储服务器进行处理的数据读写请求进行数据处理。For example, after the failure of the target smart network card is repaired, and it is determined that the distributed storage service deployed on the target storage server is running normally, it sends the acquire command to the target storage server, starts the acquire command processing timer, and waits for the acquire finish command; the target storage server is receiving When using the acquire command, close the locally running distributed storage service, enable the NVMe-of target service, write the operating status of the locally deployed distributed storage service to the second specified area, and send the acquire finish command to the target smart network card; if If the target iNIC receives the acquire finish command before the timer expires or does not receive the acquire finish command after the timer expires, the target iNIC directly starts the local distributed storage service, and uses the local distributed storage service to meet subsequent needs. The target storage server processes the data read and write requests for data processing.
基于与上述方法实施例同样的发明构思,示例性的,参阅图3所示,为本申请实施例提供的一种分布式存储服务切换装置的结构示意图,应用于分布式存储系统,所述分布式存储系统中各存储服务器分别配置有对应的智能网卡,各智能网卡上运行分布式存储服务,所述各智能网卡分别与其对应的存储服务器上用于管理本地存储资源的控制器建立RDMA通道,所述各存储服务器上部署有被设置为待启动状态的分布式存储服务,该装置包括:Based on the same inventive concept as the above-mentioned method embodiment, for example, refer to FIG. 3 , which is a schematic structural diagram of a distributed storage service switching device provided by the embodiment of the present application, which is applied to a distributed storage system. The distributed In the storage system, each storage server is configured with a corresponding smart network card, and distributed storage services are run on each smart network card, and each smart network card establishes an RDMA channel with a controller for managing local storage resources on the corresponding storage server, A distributed storage service that is set to be started is deployed on each storage server, and the device includes:
接收单元30,用于接收客户端发送的第一数据读写请求,并确定处理所述第一数据读写请求的目标存储服务器;The receiving
发送单元31,用于将所述第一数据读写请求发送给目标存储服务器对应的目标智能网卡,以使得所述目标智能网卡基于本地运行的分布式存储服务对所述第一数据读写请求进行数据处理;The sending
切换单元32,用于在检测到所述目标智能网卡故障时,启动所述目标存储服务器上部署的被设置为待启动状态的分布式存储服务,以使得所述目标存储服务器基于本地运行的分布式存储服务对所述客户端发送的需所述目标存储服务器进行处理的第二数据读写请求进行数据处理。The switching
可选地,所述目标智能网卡基于本地运行的分布式存储服务对所述第一数据读写请求进行数据处理的步骤包括:Optionally, the step of the target iNIC performing data processing on the first data read and write request based on a locally running distributed storage service includes:
所述目标智能网卡将所述第一数据读写请求发送至所述目标存储服务器上用于管理本地存储资源的控制器,其中,所述控制器通过其对应的RDMA通道对所述第一数据读写请求进行数据处理。The target iNIC sends the first data read and write request to a controller on the target storage server for managing local storage resources, wherein the controller executes the first data through its corresponding RDMA channel Read and write requests for data processing.
可选地,所述目标智能网卡在正常运行时,基于预设的周期向所述目标存储服务器的内存中的第一指定位置写入心跳计数信息;Optionally, the target iNIC writes heartbeat count information to a first designated location in the memory of the target storage server based on a preset period during normal operation;
在检测到所述目标智能网卡故障时,所述切换单元32具体用于:When a failure of the target smart network card is detected, the switching
当检测到所述目标存储服务器的内存中的第一指定位置维护的心跳计数在预设时长内未增加时,确定检测到所述目标智能网卡故障。When it is detected that the heartbeat count maintained at the first specified location in the internal memory of the target storage server does not increase within a preset time period, it is determined that a failure of the target smart network card is detected.
可选地,所述目标存储服务器上部署的被设置为待启动状态的分布式存储服务启动之后,所述目标存储服务器基于预设的周期向内存中的第二指定位置写入心跳计数信息;Optionally, after the distributed storage service deployed on the target storage server is started, the target storage server writes heartbeat count information to a second specified location in the memory based on a preset cycle;
所述切换单元32还用于,在检测到所述目标智能网卡恢复正常时,若所述目标存储服务器上部署的分布式存储服务的状态为待启动和/或所述目标存储服务器的内存中的第二指定位置维护的心跳计数在预设时长内未增加,则所述目标智能网卡启动分布式存储服务,并对所述客户端发送的需所述目标存储服务器进行处理的第三数据读写请求进行数据处理。The switching
可选地,所述切换单元32还用于,在检测到所述目标智能网卡恢复正常时,若所述目标存储服务器上部署的分布式存储服务运行正常,则所述目标智能网卡向所述目标存储服务器发送切换指令,并启动计时器,以使得所述目标存储服务器将本地运行的分布式存储服务设置为待启动状态,并启动与所述目标智能网卡之间的RDMA通道,以及向所述目标智能网卡发送切换完成指令;若所述目标智能网卡在接收到所述切换完成指令/在所述计时器超时时未接收到所述切换完成指令,则所述目标智能网卡启动分布式存储服务,并对所述客户端发送的需所述目标存储服务器进行处理的第三数据读写请求进行数据处理。Optionally, the switching
可选地,所述装置还包括:Optionally, the device also includes:
加载单元,在所述目标智能网卡上启动分布式存储服务时,所述加载单元将所述目标存储服务器存储资源中存储的元数据加载至所述目标存储服务器的内存中的第三指定位置,其中,在检测到所述目标智能网卡故障,启动所述目标存储服务器上部署的被设置为待启动状态的分布式存储服务时,所述目标存储服务器基于所述第三指定位置存储的元数据对所述第二数据读写请求进行数据处理。a loading unit, when the distributed storage service is started on the target smart network card, the loading unit loads the metadata stored in the storage resource of the target storage server to a third specified location in the memory of the target storage server, Wherein, when a failure of the target smart network card is detected and the distributed storage service deployed on the target storage server and set to be started is started, the target storage server stores the metadata based on the third specified location. Perform data processing on the second data read and write request.
以上这些单元可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(Application Specific Integrated Circuit,简称ASIC),或,一个或多个微处理器(digital singnal processor,简称DSP),或,一个或者多个现场可编程门阵列(Field Programmable Gate Array,简称FPGA)等。再如,当以上某个单元通过处理元件调度程序代码的形式实现时,该处理元件可以是通用处理器,例如中央处理器(CentralProcessing Unit,简称CPU)或其它可以调用程序代码的处理器。再如,这些单元可以集成在一起,以片上系统(system-on-a-chip,简称SOC)的形式实现。The above units may be one or more integrated circuits configured to implement the above method, for example: one or more specific integrated circuits (Application Specific Integrated Circuit, referred to as ASIC), or, one or more microprocessors (digital signal processor, DSP for short), or, one or more Field Programmable Gate Arrays (Field Programmable Gate Array, FPGA for short), etc. For another example, when one of the above units is implemented in the form of a processing element scheduling program code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU for short) or other processors that can call program codes. For another example, these units can be integrated together and implemented in the form of a system-on-a-chip (SOC for short).
进一步地,本申请实施例提供的分布式存储服务切换装置,从硬件层面而言,所述分布式存储服务切换装置的硬件架构示意图可以参见图4所示,所述分布式存储服务切换装置可以包括:存储器40和处理器41,Further, for the distributed storage service switching device provided in the embodiment of the present application, from the hardware level, the hardware architecture diagram of the distributed storage service switching device can be referred to in Figure 4, and the distributed storage service switching device can Including:
存储器40用于存储程序指令;处理器41调用存储器40中存储的程序指令,按照获得的程序指令执行上述方法实施例。具体实现方式和技术效果类似,这里不再赘述。The
可选地,本申请还提供一种分布式存储服务切换设备,包括用于执行上述方法实施例的至少一个处理元件(或芯片)。Optionally, the present application further provides a distributed storage service switching device, including at least one processing element (or chip) configured to execute the foregoing method embodiments.
可选地,本申请还提供一种程序产品,例如计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令用于使该计算机执行上述方法实施例。Optionally, the present application further provides a program product, such as a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to cause the computer to execute the foregoing method embodiments.
这里,机器可读存储介质可以是任何电子、磁性、光学或其它物理存储装置,可以包含或存储信息,如可执行指令、数据,等等。例如,机器可读存储介质可以是:RAM(RadomAccess Memory,随机存取存储器)、易失存储器、非易失性存储器、闪存、存储驱动器(如硬盘驱动器)、固态硬盘、任何类型的存储盘(如光盘、dvd等),或者类似的存储介质,或者它们的组合。Here, a machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, and the like. For example, the machine-readable storage medium can be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, storage driver (such as hard disk drive), solid-state hard disk, any type of storage disk ( such as CD, DVD, etc.), or similar storage media, or a combination thereof.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules, or units described in the above embodiments can be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above devices, functions are divided into various units and described separately. Of course, when implementing the present application, the functions of each unit can be implemented in one or more pieces of software and/or hardware.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可以由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
而且,这些计算机程序指令也可以存储在能引导计算机或其它可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或者多个流程和/或方框图一个方框或者多个方框中指定的功能。Moreover, these computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to operate in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, The instruction means implements the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其它可编程数据处理设备上,使得在计算机或者其它可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其它可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operational steps are performed on the computer or other programmable equipment to produce computer-implemented processing, so that the information executed on the computer or other programmable equipment The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。The above is only a preferred embodiment of the application, and is not intended to limit the application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the application should be included in the application. within the scope of protection.
Claims (12)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011344216.1A CN112596960B (en) | 2020-11-25 | 2020-11-25 | Distributed storage service switching method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011344216.1A CN112596960B (en) | 2020-11-25 | 2020-11-25 | Distributed storage service switching method and device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN112596960A CN112596960A (en) | 2021-04-02 |
| CN112596960B true CN112596960B (en) | 2023-06-13 |
Family
ID=75184122
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011344216.1A Active CN112596960B (en) | 2020-11-25 | 2020-11-25 | Distributed storage service switching method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112596960B (en) |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113253925B (en) * | 2021-04-30 | 2022-08-30 | 新华三大数据技术有限公司 | Method and device for optimizing read-write performance |
| CN113282246B (en) * | 2021-06-15 | 2023-07-04 | 杭州海康威视数字技术股份有限公司 | Data processing method and device |
| CN113824812B (en) * | 2021-08-27 | 2023-02-28 | 济南浪潮数据技术有限公司 | Method, device and storage medium for HDFS service to acquire service node IP |
| CN114338721B (en) * | 2021-12-28 | 2024-01-02 | 中国电信股份有限公司 | Data processing method and device, target network system and readable storage medium |
| CN114327903B (en) * | 2021-12-30 | 2023-11-03 | 苏州浪潮智能科技有限公司 | NVMe-oF management system, resource allocation method and IO read-write method |
| CN116560900A (en) * | 2022-01-30 | 2023-08-08 | 华为技术有限公司 | Method for reading data or method for writing data and related system |
| CN114546279B (en) * | 2022-02-24 | 2023-11-14 | 重庆紫光华山智安科技有限公司 | IO request prediction method and device, storage node and readable storage medium |
| CN115022328B (en) * | 2022-06-24 | 2023-08-08 | 脸萌有限公司 | Server cluster, testing method and device of server cluster and electronic equipment |
| CN115242807B (en) * | 2022-06-30 | 2024-07-05 | 深圳震有科技股份有限公司 | Data access method in 5G communication system and related equipment |
| CN116170284B (en) * | 2023-02-24 | 2025-09-09 | 济南浪潮数据技术有限公司 | Fault optimization method, system, equipment and medium of client |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW200412071A (en) * | 2002-12-31 | 2004-07-01 | Inventec Corp | Method for balancing load of network interface card testing |
| CN101115054A (en) * | 2006-07-26 | 2008-01-30 | 惠普开发有限公司 | Memory-mapped buffers for network interface controllers |
| US7739543B1 (en) * | 2003-04-23 | 2010-06-15 | Netapp, Inc. | System and method for transport-level failover for loosely coupled iSCSI target devices |
| CN107085503A (en) * | 2017-03-27 | 2017-08-22 | 联想(北京)有限公司 | storage device, storage system and information processing method |
| CN109327539A (en) * | 2018-11-15 | 2019-02-12 | 上海天玑数据技术有限公司 | A kind of distributed block storage system and its data routing method |
Family Cites Families (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7457861B1 (en) * | 2003-12-05 | 2008-11-25 | Unisys Corporation | Optimizing virtual interface architecture (VIA) on multiprocessor servers and physically independent consolidated NICs |
| US9313274B2 (en) * | 2013-09-05 | 2016-04-12 | Google Inc. | Isolating clients of distributed storage systems |
| US10257273B2 (en) * | 2015-07-31 | 2019-04-09 | Netapp, Inc. | Systems, methods and devices for RDMA read/write operations |
| US10713210B2 (en) * | 2015-10-13 | 2020-07-14 | Microsoft Technology Licensing, Llc | Distributed self-directed lock-free RDMA-based B-tree key-value manager |
| US9836368B2 (en) * | 2015-10-22 | 2017-12-05 | Netapp, Inc. | Implementing automatic switchover |
| US20180336061A1 (en) * | 2017-05-16 | 2018-11-22 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Storing file portions in data storage space available to service processors across a plurality of endpoint devices |
| US10503590B2 (en) * | 2017-09-21 | 2019-12-10 | International Business Machines Corporation | Storage array comprising a host-offloaded storage function |
| US11347678B2 (en) * | 2018-08-06 | 2022-05-31 | Oracle International Corporation | One-sided reliable remote direct memory operations |
-
2020
- 2020-11-25 CN CN202011344216.1A patent/CN112596960B/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW200412071A (en) * | 2002-12-31 | 2004-07-01 | Inventec Corp | Method for balancing load of network interface card testing |
| US7739543B1 (en) * | 2003-04-23 | 2010-06-15 | Netapp, Inc. | System and method for transport-level failover for loosely coupled iSCSI target devices |
| CN101115054A (en) * | 2006-07-26 | 2008-01-30 | 惠普开发有限公司 | Memory-mapped buffers for network interface controllers |
| CN107085503A (en) * | 2017-03-27 | 2017-08-22 | 联想(北京)有限公司 | storage device, storage system and information processing method |
| CN109327539A (en) * | 2018-11-15 | 2019-02-12 | 上海天玑数据技术有限公司 | A kind of distributed block storage system and its data routing method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112596960A (en) | 2021-04-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112596960B (en) | Distributed storage service switching method and device | |
| US9413683B2 (en) | Managing resources in a distributed system using dynamic clusters | |
| TWI453597B (en) | System and method for management of an iov adapter through a virtual intermediary in an iov management partition | |
| CN110704161B (en) | Virtual machine creation method and device and computer equipment | |
| US10606677B2 (en) | Method of retrieving debugging data in UEFI and computer system thereof | |
| US11797287B1 (en) | Automatically terminating deployment of containerized applications | |
| CN113204407A (en) | Memory over-allocation management method and device | |
| WO2024193096A1 (en) | Data migration method and computing device | |
| CN114550773A (en) | Memory controller, memory system, and data processing method | |
| WO2025148490A1 (en) | Storage space management method and management device | |
| CN111666184A (en) | Solid state drive SSD hard disk test method and device and electronic equipment | |
| CN115481052B (en) | A data exchange method and apparatus | |
| CN112596669A (en) | Data processing method and device based on distributed storage | |
| EP4521228A1 (en) | System deployment method, apparatus, electronic device and storage medium | |
| US20230244417A1 (en) | Storage node, storage device, and network chip | |
| CN116301610A (en) | A data processing method and related equipment | |
| CN119668489A (en) | A data processing method and related device | |
| US12093528B2 (en) | System and method for managing data access in distributed systems | |
| US20250377931A1 (en) | Container live migration method, processor, host, chip, and interface card | |
| US20240256352A1 (en) | System and method for managing data retention in distributed systems | |
| US12339754B1 (en) | Systematic topology spread of pods to data processing systems | |
| US12174703B2 (en) | System and method for managing recovery of management controllers | |
| US20250217239A1 (en) | Distributed storage system and operating method thereof | |
| US20230114771A1 (en) | Target triggered io classification using computational storage tunnel | |
| CN117435212A (en) | Bare metal server management methods and related devices |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |