CN118626008A

CN118626008A - Metadata management method, device, electronic device and readable storage medium

Info

Publication number: CN118626008A
Application number: CN202410690377.8A
Authority: CN
Inventors: 陈志广; 陈志涛; 卢宇彤
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2024-05-30
Filing date: 2024-05-30
Publication date: 2024-09-10

Abstract

The application discloses a metadata management method, a device, an electronic device and a readable storage medium, which relate to the technical field of large-scale data storage and are applied to a server side configured on a metadata management system, wherein the server side and a memory pool on the metadata management system are in communication connection through a remote direct memory access network, and the method comprises the following steps: responding to an access request sent by a client, and acquiring path information of a node to be accessed from the access request; calling a remote direct memory access network, and searching the memory address of the node to be accessed in a memory pool of the metadata management system according to the path information; and acquiring access metadata of the node to be accessed through the memory address and performing data operation on the access metadata. The application overcomes the defect of too high access delay of data read-write and the like caused by too deep input/output software stack of the traditional file system.

Description

Metadata management method, device, electronic device and readable storage medium

技术领域Technical Field

本申请涉及大规模数据存储技术领域，尤其涉及元数据管理方法、装置、电子设备及可读存储介质。The present application relates to the technical field of large-scale data storage, and in particular to a metadata management method, device, electronic device, and readable storage medium.

背景技术Background Art

数字化、信息化的持续发展是推动计算、存储和网络等基础设施不断演化迭代的动力源泉。随着全球企业的数字化转型进程发展以及科学研究对大数据处理的依赖加深，以云计算和人工智能为代表的大数据技术正在飞速发展，全球数据量也呈指数级激增，数据中心正面临如何对海量数据的高效存储和管理这一严峻挑战。The continuous development of digitalization and informatization is the driving force behind the continuous evolution and iteration of infrastructure such as computing, storage and networks. As global enterprises’ digital transformation progresses and scientific research becomes more dependent on big data processing, big data technologies represented by cloud computing and artificial intelligence are developing rapidly, and the amount of global data is also increasing exponentially. Data centers are facing the severe challenge of how to efficiently store and manage massive amounts of data.

目前大多数据中心采用的是传统的存储设备(例如SSD(固态硬盘，Solid-StateDisk)和HDD(机械硬盘，Hard Disk Drive等块设备)和传统文件管理系统，在传统面向SSD和HDD等块存储设备的文件系统中，通过传统文件系统发起的数据读写等的访问请求会经历虚拟文件系统、块设备文件系统、通用块层、I/O(Input/Output，输入输出)调度层和块设备驱动层等构成的过深的I/0软件栈的处理，最终才能完成真正的读写，且块设备本身的读写速度较慢，所以总体来说传统块设备和传统文件系统的访问延迟过高。Currently, most data centers use traditional storage devices (such as SSD (Solid-State Disk) and HDD (Hard Disk Drive and other block devices) and traditional file management systems. In the traditional file system for block storage devices such as SSD and HDD, the access request for data reading and writing initiated by the traditional file system will go through the processing of the too deep I/0 software stack composed of the virtual file system, block device file system, general block layer, I/O (Input/Output) scheduling layer and block device driver layer, etc., before the real reading and writing can be completed. In addition, the reading and writing speed of the block device itself is slow, so in general, the access delay of traditional block devices and traditional file systems is too high.

虽然目前出现了比SSD和HDD读写性能更优的，持久性内存等高性能的存储设备，进而可以从硬件层面上提高读写速度，但通过传统文件系统向高性能的存储设备发起数据读写等的访问请求时，仍需通过过深的I/O软件栈完成数据读写，从而制约高性能存储设备的读写速度，导致产生较大的访问延迟。因此，目前存在因传统文件系统过深的输入/输出软件栈导致的访问延迟过高的技术问题。Although there are high-performance storage devices such as persistent memory that have better read and write performance than SSDs and HDDs, which can improve the read and write speed from the hardware level, when traditional file systems initiate access requests for data read and write to high-performance storage devices, they still need to complete data read and write through an overly deep I/O software stack, which restricts the read and write speed of high-performance storage devices and causes large access delays. Therefore, there is currently a technical problem of high access delays caused by the overly deep input/output software stack of traditional file systems.

上述内容仅用于辅助理解本申请的技术方案，并不代表承认上述内容为现有技术。The above contents are only used to assist in understanding the technical solution of the present application and do not constitute an admission that the above contents are prior art.

发明内容Summary of the invention

本申请的主要目的在于提供一种元数据管理方法、装置、电子设备及可读存储介质，旨在解决因传统文件系统过深的输入/输出软件栈导致的访问延迟过高的技术问题。The main purpose of this application is to provide a metadata management method, device, electronic device and readable storage medium, aiming to solve the technical problem of excessive access delay caused by the too deep input/output software stack of the traditional file system.

为实现上述目的，本申请提供一种元数据管理方法，应用于配置在元数据管理系统上的服务端，所述服务端与所述元数据管理系统上的内存池通过远程直接内存访问网络建立通信连接，所述的方法包括：To achieve the above object, the present application provides a metadata management method, which is applied to a server configured on a metadata management system, wherein the server establishes a communication connection with a memory pool on the metadata management system via a remote direct memory access network, and the method comprises:

响应于客户端发送的访问请求，从所述访问请求中获取待访问节点的路径信息；调用远程直接内存访问网络，根据所述路径信息在所述元数据管理系统的内存池中查找所述待访问节点的内存地址；通过所述内存地址，获取所述待访问节点的访问元数据并对所述访问元数据进行数据操作。In response to an access request sent by a client, path information of a node to be accessed is obtained from the access request; a remote direct memory access network is called to search for a memory address of the node to be accessed in a memory pool of the metadata management system according to the path information; access metadata of the node to be accessed is obtained through the memory address and data operations are performed on the access metadata.

在一实施例中，所述调用远程直接内存访问网络，根据所述路径信息在所述元数据管理系统的内存池中查找所述待访问节点的内存地址的步骤包括：从所述路径信息中确定出根目录节点，调用所述远程直接内存访问网络，在所述内存池中读取所述根目录节点所在的根元数据桶；将所述根目录节点作为目标节点，在所述根元数据桶中查找与所述目标节点的目标哈希值相匹配的目标哈希桶，在所述目标哈希桶中查找所述目标节点的目标元数据；在所述目标元数据的访问权限对所述客户端开放的情况下，在所述路径信息中获取所述目标节点的次级路径节点，根据所述目标哈希值和所述次级路径节点的次级名称，计算所述次级路径节点的次级哈希值；在所述根元数据桶中查找与所述次级哈希值相匹配的次级哈希桶，若所述次级哈希桶中存在所述次级路径节点的节点元数据，则判断所述次级路径节点是否为待访问节点；若是，则根据所述次级哈希值，确定所述内存地址。In one embodiment, the step of calling a remote direct memory access network and searching for the memory address of the node to be accessed in the memory pool of the metadata management system according to the path information includes: determining a root directory node from the path information, calling the remote direct memory access network, and reading a root metadata bucket where the root directory node is located in the memory pool; taking the root directory node as a target node, searching for a target hash bucket matching a target hash value of the target node in the root metadata bucket, and searching for target metadata of the target node in the target hash bucket; when the access rights of the target metadata are open to the client, obtaining a secondary path node of the target node in the path information, and calculating a secondary hash value of the secondary path node according to the target hash value and a secondary name of the secondary path node; searching for a secondary hash bucket matching the secondary hash value in the root metadata bucket, and if the secondary hash bucket contains node metadata of the secondary path node, determining whether the secondary path node is a node to be accessed; if so, determining the memory address according to the secondary hash value.

在一实施例中，在所述根元数据桶中查找与所述次级哈希值相匹配的次级哈希桶的步骤之后，还包括：若所述次级哈希桶中不存在所述次级路径节点的节点元数据，则判断所述次级路径节点的父节点是否存在扩展元数据桶；在所述父节点存在扩展元数据桶的情况下，通过远程直接内存访问网络在所述内存池中读取所述扩展元数据桶，并在所述扩展元数据桶中查找所述次级路径节点的扩展哈希桶；若所述扩展哈希桶中存在所述次级路径节点的节点元数据，则在所述次级路径节点为待访问节点的情况下，根据所述次级哈希值，确定所述内存地址。In one embodiment, after the step of searching the root metadata bucket for a secondary hash bucket that matches the secondary hash value, it also includes: if the node metadata of the secondary path node does not exist in the secondary hash bucket, determining whether the parent node of the secondary path node has an extended metadata bucket; if the parent node has an extended metadata bucket, reading the extended metadata bucket in the memory pool via a remote direct memory access network, and searching for the extended hash bucket of the secondary path node in the extended metadata bucket; if the node metadata of the secondary path node exists in the extended hash bucket, then when the secondary path node is a node to be accessed, determining the memory address according to the secondary hash value.

在一实施例中，所述对所述访问元数据进行数据操作的步骤包括：在所述访问请求的类型为写请求的情况下，调用远程直接内存访问替换操作，判断访问元数据所在的访问元数据桶的锁区域是否为空；若为空，则确定所述服务端为修改主节点，并通过远程直接内存访问替换操作，在所述访问元数据桶的锁区域写入所述修改主节点的主配置信息，获得在所述访问元数据桶中的写开放权限，并在所述访问元数据桶中对所述访问元数据执行写操作；若不为空，则确定所述服务端为修改从节点，并从所述访问元数据桶的锁区域中获取主配置信息，根据所述主配置信息与所述修改主节点建立远程直接内存访问网络的通信连接，并向所述修改主节点发送所述访问元数据以通过所述修改主节点对所述访问元数据执行写操作。In one embodiment, the step of performing data operations on the access metadata includes: when the type of the access request is a write request, calling a remote direct memory access replacement operation to determine whether the lock area of the access metadata bucket where the access metadata is located is empty; if it is empty, determining that the server is a modification master node, and through a remote direct memory access replacement operation, writing the master configuration information of the modification master node in the lock area of the access metadata bucket, obtaining write open permissions in the access metadata bucket, and performing a write operation on the access metadata in the access metadata bucket; if it is not empty, determining that the server is a modification slave node, and obtaining the master configuration information from the lock area of the access metadata bucket, establishing a remote direct memory access network communication connection with the modification master node according to the master configuration information, and sending the access metadata to the modification master node to perform a write operation on the access metadata through the modification master node.

在一实施例中，所述通过远程直接内存访问替换操作，在所述访问元数据桶的锁区域写入所述修改主节点的主配置信息，获得在所述访问元数据桶中的写开放权限，并在所述访问元数据桶中对所述访问元数据执行写操作的步骤包括：通过所述远程直接内存访问替换操作，在所述访问元数据桶的锁区域写入所述修改主节点的节点配置信息，并在所述修改主节点中创建日志空间，并确定日志空间的远程直接内存访问配置信息；将所述日志空间的远程直接内存访问配置信息注册到所述修改主节点上，并将远程直接内存访问配置信息写入所述锁区域中，获得所述访问元数据桶的写开放权限；在获得在所述访问元数据桶中的写开放权限的情况下，对访问元数据执行写操作。In one embodiment, the step of writing the master configuration information of the modified master node in the lock area of the access metadata bucket through a remote direct memory access replacement operation, obtaining write-open permission in the access metadata bucket, and performing a write operation on the access metadata in the access metadata bucket includes: writing the node configuration information of the modified master node in the lock area of the access metadata bucket through the remote direct memory access replacement operation, creating a log space in the modified master node, and determining the remote direct memory access configuration information of the log space; registering the remote direct memory access configuration information of the log space on the modified master node, and writing the remote direct memory access configuration information into the lock area, obtaining write-open permission for the access metadata bucket; and performing a write operation on the access metadata when the write-open permission in the access metadata bucket is obtained.

在一实施例中，在所述在获得在所述访问元数据桶中的写开放权限的情况下的步骤之后，还包括：监测所述日志空间的索引区域是否更新，若所述索引区域更新，则从所述索引区域中获取索引增值，从所述索引增值关联的日志区中获取待修改元数据以及所述待修改元数据的修改信息；在所述访问元数据桶中根据所述修改信息对待修改元数据进行修改，并在与所述索引增值关联的结果区中更新所述待修改元数据的修改状态。In one embodiment, after the step of obtaining the write open permission in the access metadata bucket, it also includes: monitoring whether the index area of the log space is updated, and if the index area is updated, obtaining the index increment from the index area, obtaining the metadata to be modified and the modification information of the metadata to be modified from the log area associated with the index increment; modifying the metadata to be modified in the access metadata bucket according to the modification information, and updating the modification status of the metadata to be modified in the result area associated with the index increment.

在一实施例中，所述访问元数据包括待修改元数据，所述向所述修改主节点发送所述访问元数据以通过所述修改主节点对所述访问元数据执行写操作的步骤包括：调用远程直接内存访问自增操作，在所述修改主节点的日志空间的索引区域进行索引自增，得到所述修改从节点的索引增值；根据所述索引增值，将待修改元数据写入到与所述索引增值关联的日志区中，以供所述修改主节点对所述日志区的待修改元数据进行修改；并轮询所索引增值关联的结果区，在检测到所述结果区中待修改元数据的修改状态更新，则向所述修改从节点的客户端反馈待修改元数据的修改结果。In one embodiment, the access metadata includes metadata to be modified, and the step of sending the access metadata to the modification master node to perform a write operation on the access metadata through the modification master node includes: calling a remote direct memory access self-increment operation, performing index self-increment in the index area of the log space of the modification master node, and obtaining an index increment of the modification slave node; based on the index increment, writing the metadata to be modified into a log area associated with the index increment, so that the modification master node can modify the metadata to be modified in the log area; and polling a result area associated with the index increment, and upon detecting that the modification status of the metadata to be modified in the result area is updated, feeding back the modification result of the metadata to be modified to the client of the modification slave node.

此外，为实现上述目的，为实现上述目的，本申请提供一种元数据管理装置，应用于配置在元数据管理系统上的服务端，所述服务端与所述元数据管理系统上的内存池通过远程直接内存访问网络建立通信连接，所述的装置包括：响应模块，用于响应于客户端发送的访问请求，从所述访问请求中获取待访问节点的路径信息；In addition, to achieve the above-mentioned purpose, the present application provides a metadata management device, which is applied to a server configured on a metadata management system, wherein the server establishes a communication connection with a memory pool on the metadata management system through a remote direct memory access network, and the device comprises: a response module, which is used to respond to an access request sent by a client, and obtain path information of a node to be accessed from the access request;

调用模块，用于调用远程直接内存访问网络，根据所述路径信息在所述元数据管理系统的内存池中查找所述待访问节点的内存地址；A calling module, used for calling a remote direct memory access network, and searching for a memory address of the node to be accessed in a memory pool of the metadata management system according to the path information;

操作模块，用于通过所述内存地址，获取所述待访问节点的访问元数据并对所述访问元数据进行数据操作。An operation module is used to obtain the access metadata of the node to be accessed through the memory address and perform data operations on the access metadata.

此外，为实现上述目的，本申请还提供一种电子设备，所述电子设备包括：存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的所述元数据管理方法的程序，所述元数据管理方法的程序被处理器执行时可实现如上述的元数据管理方法的步骤。此外，为实现上述目的，本申请还提供一种计算机可读存储介质，所述计算机可读存储介质上存储有实现元数据管理方法的程序，所述元数据管理方法的程序被处理器执行时实现如上述的元数据管理方法的步骤。此外，为实现上述目的，本申请还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述的元数据管理方法的步骤。In addition, to achieve the above-mentioned purpose, the present application also provides an electronic device, the electronic device comprising: a memory, a processor, and a program of the metadata management method stored in the memory and executable on the processor, the program of the metadata management method can implement the steps of the metadata management method as described above when the program of the metadata management method is executed by the processor. In addition, to achieve the above-mentioned purpose, the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a program for implementing the metadata management method, the program of the metadata management method can implement the steps of the metadata management method as described above when the program of the metadata management method is executed by the processor. In addition, to achieve the above-mentioned purpose, the present application also provides a computer program product, comprising a computer program, the computer program can implement the steps of the metadata management method as described above when the computer program is executed by the processor.

本申请提出的一个或多个技术方案，至少具有以下技术效果：本申请应用于配置在元数据管理系统上的服务端，服务端与元数据管理系统上的内存池之间通过远程直接内存访问(RDMA，Remote Direct Memory Access)网络建立通信连接，使得服务端可以直接通过远程直接内存访问网络直接访问内存池，从而便于服务端在接收到客户端发送的访问请求，从访问请求中获取待访问节点的路径信息时，可以调用远程直接内存访问网络，直接访问内存池，实现可以快速确定待访问节点在内存池中的内存地址，从而便于通过内存地址，获取待访问节点的访问元数据，并对访问元数据进行数据操作，从而无需再经过虚拟文件系统、块设备文件系统、通用块层等多层构成的I/O软件栈完成访问，减少了访问延迟，所以本申请无需通过过深的I/O软件栈完成访问，通过调用远程直接内存访问网络直接访问内存池的方式，克服了因传统文件系统过深的输入/输出软件栈(I/O软件栈)导致的访问延迟过高的缺陷。One or more technical solutions proposed in the present application have at least the following technical effects: the present application is applied to a server configured on a metadata management system, and a communication connection is established between the server and a memory pool on the metadata management system through a remote direct memory access (RDMA) network, so that the server can directly access the memory pool through the remote direct memory access network, so that when the server receives an access request sent by a client and obtains path information of a node to be accessed from the access request, it can call the remote direct memory access network to directly access the memory pool, so that the memory address of the node to be accessed in the memory pool can be quickly determined, so that it is convenient to obtain the access metadata of the node to be accessed through the memory address, and perform data operations on the access metadata, thereby eliminating the need to complete access through an I/O software stack composed of multiple layers such as a virtual file system, a block device file system, and a general block layer, thereby reducing access delay. Therefore, the present application does not need to complete access through an overly deep I/O software stack, and overcomes the defect of excessively high access delay caused by the overly deep input/output software stack (I/O software stack) of a traditional file system by calling a remote direct memory access network to directly access the memory pool.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处的附图被并入说明书中并构成本说明书的一部分，表示出了符合本申请的实施例，并与说明书一起用于解释本申请的原理。The accompanying drawings herein are incorporated in and constitute a part of the specification, illustrate embodiments consistent with the present application, and together with the description, are used to explain the principles of the present application.

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员而言，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, for ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative labor.

图1为本申请元数据管理方法第一实施例的流程示意图；FIG1 is a schematic diagram of a flow chart of a first embodiment of a metadata management method of the present application;

图2为本申请元数据管理方法中分离式持久性内存架构集群示意图；FIG2 is a schematic diagram of a separate persistent memory architecture cluster in the metadata management method of the present application;

图3为本申请元数据管理方法中目录树在内存池的存储示意图；FIG3 is a schematic diagram of storage of a directory tree in a memory pool in the metadata management method of the present application;

图4为本申请元数据管理方法第二实施例的流程示意图；FIG4 is a schematic diagram of a flow chart of a second embodiment of the metadata management method of the present application;

图5为本申请元数据管理方法中修改主节点的日志空间的示意图；FIG5 is a schematic diagram of modifying the log space of the master node in the metadata management method of the present application;

图6为本申请实施例元数据管理装置的模块结构示意图；FIG6 is a schematic diagram of the module structure of the metadata management device according to an embodiment of the present application;

图7为本申请实施例中元数据管理方法涉及的硬件运行环境的设备结构示意图。FIG. 7 is a schematic diagram of the device structure of the hardware operating environment involved in the metadata management method in the embodiment of the present application.

本申请目的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The purpose, features and advantages of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

具体实施方式DETAILED DESCRIPTION

应当理解，此处所描述的具体实施例仅仅用以解释本申请的技术方案，并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the technical solutions of the present application and are not used to limit the present application.

为了更好的理解本申请的技术方案，下面将结合说明书附图以及具体的实施方式进行详细的说明。本申请实施例的主要解决方案是：为分离式持久性内存架构提供一种新的文件管理系统，也即本申请实施例中的元数据管理系统，本申请中的元数据管理系统可以应用在分离式持久性内存架构中。具体的，本申请的元数据管理系统配置有服务端、内存池以及客户端，内存池配备大量持久性内存，用于文件元数据的持久化存储，运行于分离式持久性内存架构的内存节点，服务端提供文件元数据服务，对客户端发来的元数据读写请求进行处理，并返回响应，运行于分离式持久性内存架构的计算节点。客户端则负责提供文件访问接口，供上层应用(如计算机上应用程序等)调用，运行于任意与可以服务端联系的服务器。服务端所处的计算节点和内存池所处的内存节点之间通过RDMA网络进行网络互联，服务端得以通过RDMA网络直接读写内存池上的持久性内存，而无需内存节点的处理器参与。服务端之间也可以通过RDMA网络进行网络互联，使得服务端之间也能通过RDMA网络直接相互访问，无需经过服务端各自的处理器。In order to better understand the technical solution of the present application, the following will be described in detail in conjunction with the drawings of the specification and the specific implementation methods. The main solution of the embodiment of the present application is: to provide a new file management system for the separated persistent memory architecture, that is, the metadata management system in the embodiment of the present application, and the metadata management system in the present application can be applied in the separated persistent memory architecture. Specifically, the metadata management system of the present application is configured with a server, a memory pool and a client. The memory pool is equipped with a large amount of persistent memory for persistent storage of file metadata, running on the memory node of the separated persistent memory architecture, and the server provides file metadata services, processes metadata read and write requests sent by the client, and returns a response, running on the computing node of the separated persistent memory architecture. The client is responsible for providing a file access interface for upper-level applications (such as applications on computers, etc.) to call, and runs on any server that can contact the server. The computing node where the server is located and the memory node where the memory pool is located are interconnected through the RDMA network, and the server is able to directly read and write the persistent memory on the memory pool through the RDMA network without the participation of the processor of the memory node. Servers can also be interconnected through the RDMA network, so that servers can directly access each other through the RDMA network without going through the server's respective processors.

为更好理解本申请，以下对分离式持久性内存架构进行说明：分离式持久性内存架构是基于持久性内存技术和分离式内存技术结合得到的。在“目前大多数数据中心采用的是传统的服务器架构，独立服务器的计算、存储和网络资源紧密绑定，再通过网络互联以实现动态扩展。这样传统架构导致数据中心硬件资源只能以服务器为最小单位进行增减，无法实现对某类特定硬件资源的灵活调用，面对资源需求差异显著的业务时可能会造成某类资源的严重浪费。因此，传统数据中心的存储系统已经表现出了性能不足、灵活性低、扩展性差的短缺，这已经成为了海量数据时代下计算机系统进一步发展的瓶颈，突破数据I/O瓶颈已然成为了当下提高计算机系统性能最迫切的需求之一。”的情况下，出现了持久性内存技术和分离式内存技术：To better understand this application, the following is an explanation of the separated persistent memory architecture: The separated persistent memory architecture is based on the combination of persistent memory technology and separated memory technology. In the context of "most data centers currently use traditional server architecture, where the computing, storage and network resources of independent servers are tightly bound and then interconnected through the network to achieve dynamic expansion. This traditional architecture results in data center hardware resources being able to increase or decrease only with servers as the smallest unit, and it is impossible to flexibly call a certain type of hardware resource, which may cause serious waste of certain resources when faced with businesses with significantly different resource requirements. Therefore, the storage systems of traditional data centers have shown shortcomings such as insufficient performance, low flexibility, and poor scalability, which has become a bottleneck for the further development of computer systems in the era of massive data. Breaking through the data I/O bottleneck has become one of the most urgent needs to improve the performance of computer systems today.", persistent memory technology and separated memory technology have emerged:

1、持久性内存技术：持久性内存(Persistent Memory，PM)在学术界和工业界的多个研究机构分别提出了不同的持久性内存实现方案，例如自旋转矩随机存储器和于3D-XPoint等。持久性内存具有可持久化存储、可按字节寻址、存储密度高、能耗低、访问延迟低等诸多优秀的存储性质，并且它同动态随机存取内存(Dynamic Random Access Memory，DRAM)一样可以直接挂载到内存总线上，使得CPU能直接通过load/store(加载/存储)指令寻址访问。持久性内存技术使得计算机系统能够以接近DRAM的读写性能将数据持久化到主存当中。持久性内存的可按字节寻址的打破了SSD和HDD这类块设备的访问限制，更灵活的读写能力为数据访问带来了极大的优化空间。1. Persistent Memory Technology: Different persistent memory implementation schemes have been proposed by various research institutes in academia and industry, such as spin torque random access memory and 3D-XPoint. Persistent memory has many excellent storage properties, such as persistent storage, byte addressability, high storage density, low energy consumption, and low access latency. Like dynamic random access memory (DRAM), it can be directly mounted on the memory bus, allowing the CPU to directly address and access it through load/store instructions. Persistent memory technology enables computer systems to persist data in main memory with read and write performance close to that of DRAM. The byte addressability of persistent memory breaks the access restrictions of block devices such as SSD and HDD, and the more flexible read and write capabilities bring great optimization space for data access.

2、分离式内存技术：为了解决传统服务器架构数据中心的资源利用率低下和扩缩不灵活等问题，分离式数据中心应运而生。它的中心思想是将数据中心的硬件资源分类拆分，分别形成计算池、内存池和存储池等不同的硬件资源池。分离式内存(DisaggregatedMemory，DM)技术是其中重要的组成部分，它将DRAM从服务器中独立出来，形成可供计算节点灵活调用的内存池，并通过高速网络与计算节点互联，例如RDMA、CXL(Compute ExpressLink，计算快速链路)和NVMe-oF(NVMe over Fabric，基于Fabric的NVMe协议)等。同时，为了避免计算节点每次内存读写都触发网络通信，计算节点均会配置少量的内存作为内存池的缓存。同理，内存池也会配备少量的计算资源以进行简单的计算处理。在存算分离的分离式内存架构下，硬件资源得以灵活调用和释放，且能够独立扩展和收缩，极大程度避免了资源浪费问题。将分离式内存技术和持久性内存技术结合，得到的分离式持久性内存打破了传统SSD和HDD的性能限制，分离式的架构进一步提供资源利用率，能够为当下各类大数据应用提供更高效的数据存储服务，缓解传统数据中心存储带来的短板效应。分离式持久性内存架构是硬件上的改进，若需要访问硬件上的数据则需要通过操作系统等软件层面上实现，例如，文件系统，文件系统是操作系统的系统级应用，它为运行在计算机上的应用程序提供了访问存储介质上数据的统一接口。传统的文件系统大都面向SSD和HDD等块设备设计，例如EXT4(Fourth extended filesystem，第四代扩展文件系统)。应用程序发起的访问请求会经历虚拟文件系统、块设备文件系统、通用块层、I/O调度层和块设备驱动层的等构成的I/O软件栈的处理，最终才能真正执行块设备的读写。2. Disaggregated memory technology: In order to solve the problems of low resource utilization and inflexible expansion and contraction in traditional server architecture data centers, disaggregated data centers came into being. Its central idea is to classify and split the hardware resources of the data center into different hardware resource pools such as computing pool, memory pool and storage pool. Disaggregated memory (DM) technology is an important component of it. It separates DRAM from the server to form a memory pool that can be flexibly called by computing nodes, and interconnects with computing nodes through high-speed networks, such as RDMA, CXL (Compute ExpressLink) and NVMe-oF (NVMe over Fabric). At the same time, in order to avoid triggering network communication every time the computing node reads and writes memory, the computing node will be configured with a small amount of memory as a cache for the memory pool. Similarly, the memory pool will also be equipped with a small amount of computing resources for simple computing processing. Under the disaggregated memory architecture with storage and computing separation, hardware resources can be flexibly called and released, and can be expanded and contracted independently, which greatly avoids the problem of resource waste. The separation persistent memory obtained by combining the separation memory technology with the persistent memory technology breaks the performance limitations of traditional SSDs and HDDs. The separation architecture further improves resource utilization, can provide more efficient data storage services for various big data applications, and alleviate the shortcomings of traditional data center storage. The separation persistent memory architecture is a hardware improvement. If you need to access data on the hardware, you need to implement it through the software level such as the operating system. For example, the file system is a system-level application of the operating system. It provides a unified interface for applications running on the computer to access data on the storage medium. Most traditional file systems are designed for block devices such as SSDs and HDDs, such as EXT4 (Fourth extended filesystem). The access request initiated by the application will be processed by the I/O software stack composed of the virtual file system, block device file system, general block layer, I/O scheduling layer, and block device driver layer, and finally the block device can be read and written.

但持久性内存技术具有可按字节寻址和可通过load/store指令直接读写的特性，面向块设备的文件系统无法充分发挥出持久性内存的特性和性能。再者，在分离式持久性内存架构中，计算节点和内存池之间通过RDMA高速网络进行通信，传统的文件系统无法适用于分离式持久性内存架构，传统的网络文件系统则无法利用RDMA高速网络零拷贝、内核旁路以及绕开远端CPU(Central Processing Unit，中央处理器)等优秀的特性。因而难以发挥出高性能的分离式持久性内存架构的优势，导致即使有高性能的分离式持久性内存架构，但仍然存在因传统文件系统过深的I/O软件栈导致的访问数据延迟高的问题。However, persistent memory technology has the characteristics of being byte-addressable and directly readable and writable through load/store instructions. The file system for block devices cannot give full play to the characteristics and performance of persistent memory. Furthermore, in the separated persistent memory architecture, the computing nodes and the memory pool communicate through the RDMA high-speed network. The traditional file system cannot be applied to the separated persistent memory architecture. The traditional network file system cannot take advantage of the excellent features of RDMA high-speed network zero copy, kernel bypass, and bypassing the remote CPU (Central Processing Unit). Therefore, it is difficult to give full play to the advantages of the high-performance separated persistent memory architecture. As a result, even if there is a high-performance separated persistent memory architecture, there is still a problem of high access data latency due to the overly deep I/O software stack of the traditional file system.

本申请提供一种解决方案，使得元数据管理系统上的服务端，可以直接通过远程直接内存访问网络直接访问元数据管理系统上的内存池，从而便于服务端在接收到客户端发送的访问请求，从访问请求中获取待访问节点的路径信息时，可以调用远程直接内存访问网络，直接访问内存池，实现可以快速确定待访问节点在内存池中的内存地址，从而便于通过内存地址，获取待访问节点的访问元数据，并对访问元数据进行数据操作，从而无需再经过虚拟文件系统、块设备文件系统、通用块层等多层构成的I/O软件栈完成访问，减少了访问延迟，所以本申请无需通过过深的I/O软件栈完成访问，通过调用远程直接内存访问网络直接访问内存池的方式，克服了因传统文件系统过深的I/O软件栈导致的访问延迟过高的缺陷。The present application provides a solution, so that the server on the metadata management system can directly access the memory pool on the metadata management system through the remote direct memory access network, so that when the server receives the access request sent by the client and obtains the path information of the node to be accessed from the access request, it can call the remote direct memory access network to directly access the memory pool, so that the memory address of the node to be accessed in the memory pool can be quickly determined, so that it is convenient to obtain the access metadata of the node to be accessed through the memory address, and perform data operations on the access metadata, thereby eliminating the need to complete the access through the I/O software stack composed of multiple layers such as the virtual file system, the block device file system, and the general block layer, thereby reducing the access delay. Therefore, the present application does not need to complete the access through an overly deep I/O software stack, and overcomes the defect of excessively high access delay caused by the overly deep I/O software stack of the traditional file system by calling the remote direct memory access network to directly access the memory pool.

基于此，本申请实施例提供了一种元数据管理方法，参照图1，图1为本申请元数据管理方法第一实施例的流程示意图。本实施例中，所述元数据管理方法应用于配置在元数据管理系统上的服务端，所述服务端与所述元数据管理系统上的内存池通过远程直接内存访问网络建立通信连接，元数据管理方法包括步骤S10～S30：步骤S10，响应于客户端发送的访问请求，从所述访问请求中获取待访问节点的路径信息；Based on this, an embodiment of the present application provides a metadata management method, refer to Figure 1, which is a flow chart of the first embodiment of the metadata management method of the present application. In this embodiment, the metadata management method is applied to a server configured on a metadata management system, and the server establishes a communication connection with a memory pool on the metadata management system through a remote direct memory access network. The metadata management method includes steps S10 to S30: Step S10, in response to an access request sent by a client, obtaining path information of a node to be accessed from the access request;

需要说明的是，所述元数据管理系统是面向分离式持久性内存架构的系统，在本实施例中执行主体为配置元数据管理系统上的服务端，服务端运行在面向分离式持久性内存架构中的计算节点上，服务端用于提供元数据服务，元数据管理系统还包括内存池和客户端，内存池运行在分离式持久性内存架构的内存节点，客户端提供文件访问接口，供上层应用调用，运行于任意与可以与服务端联系的服务器。内存池用于存储文件元数据目录元数据等的数据。在元数据管理系统中，可以有多个服务端，也可以有多个内存池，服务端和内存池之间都可以通过RDMA网络进行通信，服务端与服务端之间也可以通过RDMA进行通信。访问请求用于指示服务端为客户端进行元数据服务，包括元数据读写服务等，访问请求可以包括写请求，还可以包括读请求。待访问节点为客户端期望读取或修改的元数据所在的节点，路径信息为依次拼接的多个路径节点，路径信息表征为找到待访问节点的路径，待访问节点为路径信息中拼接在最后的节点，例如当路径信息为0/A/E/I时，I为路径信息中最后的路径节点，I为待访问节点。示例性的，步骤S10包括：响应于客户端发送的访问请求，从所述访问请求中确定待访问节点的路径信息，其中，访问请求中还可以携带客户端的客户标识，用于区分不同的访问客户。It should be noted that the metadata management system is a system for a separated persistent memory architecture. In this embodiment, the execution subject is a server configured on the metadata management system. The server runs on a computing node in the separated persistent memory architecture. The server is used to provide metadata services. The metadata management system also includes a memory pool and a client. The memory pool runs on a memory node of the separated persistent memory architecture. The client provides a file access interface for upper-layer applications to call and runs on any server that can contact the server. The memory pool is used to store data such as file metadata, directory metadata, etc. In the metadata management system, there can be multiple servers and multiple memory pools. The server and the memory pool can communicate through the RDMA network, and the server and the server can also communicate through RDMA. The access request is used to instruct the server to provide metadata services for the client, including metadata read and write services, etc. The access request can include a write request and a read request. The node to be accessed is the node where the metadata that the client wants to read or modify is located, the path information is a plurality of path nodes spliced in sequence, the path information is characterized by the path to find the node to be accessed, and the node to be accessed is the last node spliced in the path information, for example, when the path information is 0/A/E/I, I is the last path node in the path information, and I is the node to be accessed. Exemplarily, step S10 includes: in response to an access request sent by the client, determining the path information of the node to be accessed from the access request, wherein the access request may also carry a client identifier of the client to distinguish different access clients.

在一可行的实施例中，服务端与内存池通过远程直接内存访问网络建议通信连接具体可以包括步骤S01～步骤S02：In a feasible embodiment, the server and the memory pool may specifically include steps S01 to S02 through the remote direct memory access network communication connection:

步骤S01，调用UDP网络，发送连接请求至所述内存池的预设UDP端口，以供所述内存池接收并根据所述连接请求，向所述服务端发送RDMA连接信息；Step S01, calling the UDP network, sending a connection request to the preset UDP port of the memory pool, so that the memory pool receives and sends RDMA connection information to the server according to the connection request;

步骤S02，根据接收到所述RDMA连接信息，与所述内存池建立远程直接内存访问网络连接。Step S02: establishing a remote direct memory access network connection with the memory pool according to the received RDMA connection information.

需要说明的是，UDP网络用于在服务端和内存池还未建立远程直接内存访问网络连接时，为服务端和内存池之间提供通信的通道，以便于服务端和内存池之间建立RDAM网络的通信连接，完成服务端的初始化。预设UDP端口为内存池指定的用于监听服务端是否发送连接请求的端口，连接请求用于向内存池发起RDAM网络连接的请求，RDMA连接信息为配置在内存池上的RDMA相关配置信息，RDMA连接信息包括RDMA QP号、RDMA端口等的用于进行RDMA网络连接等的数据。其中，RDMA QP是远程直接内存访问(Remote Direct MemoryAccess)中的一种通信路径，允许两个节点之间进行直接内存访问，实现零拷贝的数据传输和低延迟的交互。每个QP(Queue Pair)都包含了一个发送队列(Send Queue)和一个接收队列(Receive Queue)，RDMA QP号用于区分不同的QP。It should be noted that the UDP network is used to provide a communication channel between the server and the memory pool when the server and the memory pool have not established a remote direct memory access network connection, so as to establish a communication connection of the RDAM network between the server and the memory pool and complete the initialization of the server. The preset UDP port is the port specified by the memory pool for listening to whether the server sends a connection request. The connection request is used to initiate a request for RDAM network connection to the memory pool. The RDMA connection information is the RDMA-related configuration information configured on the memory pool. The RDMA connection information includes data such as RDMA QP number and RDMA port for RDMA network connection. Among them, RDMA QP is a communication path in Remote Direct Memory Access, which allows direct memory access between two nodes to achieve zero-copy data transmission and low-latency interaction. Each QP (Queue Pair) contains a send queue (Send Queue) and a receive queue (Receive Queue). The RDMA QP number is used to distinguish different QPs.

示例性的，服务端在接收到内存池发送的RDMA连接信息后，从RDMA连接信息中获取RDMA QP号、RDMA端口等的用于进行RDMA网络连接等的数据，与内存池的RDMA网络建立连接，服务端在与内存池建立连接的同时，服务端还能获取内存池中开放的内存区域的地址信息和访问所需的访问密钥Remote Key(远程钥)。开放的内存区域是指的能通过RDMA网络直接被访问的区域，服务端获取并保存开放的内存区域的地址信息和内存区域访问所需的访问密钥，以在后续服务端需要访问待访问数据时，服务端可以基于内存区域的地址信息找到内存池中的内存区域，并且通过内存区域对应的访问密钥，成功访问内存区域，进而也能保证数据的安全性，避免内存池中的数据被恶意访问等。在服务端初始化前，也即在服务端与内存池建立RDMA网络连接之前，内存池也需要初始化。具体的，内存池初始化的步骤包括：内存池确定可以建立RDMA网络远程直接内存访问网络连接的内存区域，并对为该内存区域配置RDMA网络远程直接内存访问网络；内存池定义存储的数据结构。Exemplarily, after receiving the RDMA connection information sent by the memory pool, the server obtains the RDMA QP number, RDMA port and other data used for RDMA network connection from the RDMA connection information, and establishes a connection with the RDMA network of the memory pool. While the server establishes a connection with the memory pool, the server can also obtain the address information of the open memory area in the memory pool and the access key Remote Key required for access. The open memory area refers to the area that can be directly accessed through the RDMA network. The server obtains and saves the address information of the open memory area and the access key required for accessing the memory area, so that when the server needs to access the data to be accessed later, the server can find the memory area in the memory pool based on the address information of the memory area, and successfully access the memory area through the access key corresponding to the memory area, thereby ensuring the security of the data and preventing the data in the memory pool from being maliciously accessed. Before the server is initialized, that is, before the server establishes an RDMA network connection with the memory pool, the memory pool also needs to be initialized. Specifically, the steps of initializing the memory pool include: the memory pool determines a memory area to which an RDMA network remote direct memory access network connection can be established, and configures an RDMA network remote direct memory access network for the memory area; and the memory pool defines a stored data structure.

其中，内存池确定可以建立RDMA网络连接的内存区域的步骤可以为：内存池可以选择任意一个内存节点作为内存区域，内存池可以选择多个内存区域，在确定内存区域后，可以在内存区域上建立RDMA MR，并注册至网卡上以为内存区域配置RDMA网络，以支持运行在计算节点的服务端直接访问内存池的该内存区域，内存池可以保存建立了RDMA网络的内存区域的内存地址以及内存区域对应的访问密钥。在内存池完成初始化后，内存池会指定一个UDP(User Datagram Protocol，用户数据报协议)端口作为预设UDP端口，以监听所述预设UDP端口是否接收到服务端发送的连接请求，内存池在监测到服务端发送的连接请求后，将RDAM连接信息发送给服务端，以建立服务端与内存池之间的RDAM连接。Among them, the step of the memory pool determining the memory area where the RDMA network connection can be established can be: the memory pool can select any memory node as the memory area, the memory pool can select multiple memory areas, after determining the memory area, an RDMA MR can be established on the memory area, and registered on the network card to configure the RDMA network for the memory area, so as to support the server running on the computing node to directly access the memory area of the memory pool, and the memory pool can save the memory address of the memory area where the RDMA network is established and the access key corresponding to the memory area. After the memory pool is initialized, the memory pool will specify a UDP (User Datagram Protocol) port as a preset UDP port to monitor whether the preset UDP port receives a connection request sent by the server. After monitoring the connection request sent by the server, the memory pool sends the RDAM connection information to the server to establish an RDAM connection between the server and the memory pool.

内存池定义存储的数据结构的步骤包括，确定数据结构包括FSHeader、MDBs。FSHeader代表元数据管理系统的头部，用于存储全局信息；MDBs则是存储文件和目录元数据的数据结构。MDBs由多个MDB(Metadata Bucket，元数据桶)构成，MDB是服务端从内存池读取的基本单位，也是服务端在修改元数据时上锁的基本单位。MDB内部由MDBHeader(MDB的头部)和多个Hash Bucket(哈希桶)组成，MDBHeader用于存放MDB的状态信息，HashBucket则用于存放元数据。元数据(Metadata)，包括三种类型的元数据：RegularFile、Directory和DirExtend。RegularFile代表一般文件元数据，Directory代表目录元数据，DirExtend则代表目录扩展。其中，在每个元数据(元数据的类型可以是RegularFile、Directory或DirExtend)中都可以预置有扩展标识区域，扩展标识区域用于区分是否存在DirExtend类型的元数据，当元数据扩展标识区域有扩展元数据桶(扩展MDB)的标识，则说明元数据存在DirExtend类型的元数据，当元数据扩展标识区域为空，则说明元数据不存在DirExtend类型的元数据。DirExtend类型的元数据，是为了解决单个MDB容量有限而造成的子节点数量受限问题，可以理解的，当服务端在内存池中的任一目录下创建新的子节点时，服务端将计算子节点的哈希值，并将子节点的元数据放置在目录所在MDB中对应的HashBucket；若MDB的Hash Bucket已满，则服务端将在一个新的MDB，也即扩展元数据桶中创建该目录的DirExtend类型元数据，以表示扩展，并将子节点的元数据放置在扩展元数据桶中，并在该目录的扩展标识区域中记录扩展元数据桶的标识。The steps of defining the data structure stored in the memory pool include determining that the data structure includes FSHeader and MDBs. FSHeader represents the header of the metadata management system and is used to store global information; MDBs are data structures that store file and directory metadata. MDBs are composed of multiple MDBs (Metadata Buckets). MDB is the basic unit that the server reads from the memory pool and is also the basic unit that the server locks when modifying metadata. The MDB is composed of MDBHeader (MDB header) and multiple Hash Buckets. MDBHeader is used to store MDB status information, and HashBucket is used to store metadata. Metadata (Metadata) includes three types of metadata: RegularFile, Directory, and DirExtend. RegularFile represents general file metadata, Directory represents directory metadata, and DirExtend represents directory extension. Among them, an extended identification area can be preset in each metadata (the type of metadata can be RegularFile, Directory or DirExtend). The extended identification area is used to distinguish whether there is metadata of the DirExtend type. When the metadata extended identification area has the identification of the extended metadata bucket (extended MDB), it means that the metadata has metadata of the DirExtend type. When the metadata extended identification area is empty, it means that the metadata does not have metadata of the DirExtend type. The metadata of the DirExtend type is to solve the problem of the limited number of child nodes caused by the limited capacity of a single MDB. It can be understood that when the server creates a new child node under any directory in the memory pool, the server will calculate the hash value of the child node and place the metadata of the child node in the corresponding HashBucket in the MDB where the directory is located; if the Hash Bucket of the MDB is full, the server will create the DirExtend type metadata of the directory in a new MDB, that is, the extended metadata bucket, to indicate the extension, and place the metadata of the child node in the extended metadata bucket, and record the identifier of the extended metadata bucket in the extended identification area of the directory.

由于内存池支持DirExtend类型的元数据，使得服务端在内存池中创建新的目录或新的文件时，都会尽可能的将新的目录或新的文件放置在与其对应的父目录放置在同一个MDB中，或与父目录的MDB相邻的MDB中，从而使得服务端从内存池中读取元数据时，可以减少每次访问都触发网络I/O的次数，可以理解的，每当服务端从内存池中读元数据时，都会以MDB为基本读取单位，可以快速一同读回距离相近的多个MDB，可以提高在读回的MDB中查找到目标元数据的成功率，而避免了频繁的网络I/O，从而减少了网络I/O次数，实现高吞吐低延迟的访问。为更好理解本实施例，参照图2，图2为分离式持久性内存架构集群示意图，对服务端和内存池之间连接关系进行说明，内存池中分布有许多个用于存储数据的存储模块1，还包括一个处理器2，还设置有用于进行RDMA网络通信的通信组件3，服务端也有存储模块1，还有多个处理器2，也有用于进行RDMA网络通信的组件3，多个服务端之间可以进行RDMA通信，多个服务端也可以与内存池进行RDMA通信。Since the memory pool supports DirExtend type metadata, when the server creates a new directory or a new file in the memory pool, it will try its best to place the new directory or new file in the same MDB as its corresponding parent directory, or in an MDB adjacent to the MDB of the parent directory. Therefore, when the server reads metadata from the memory pool, the number of network I/Os triggered for each access can be reduced. Understandably, whenever the server reads metadata from the memory pool, it uses MDB as the basic reading unit, and can quickly read back multiple MDBs with similar distances, thereby improving the success rate of finding the target metadata in the read-back MDB and avoiding frequent network I/Os, thereby reducing the number of network I/Os and achieving high-throughput and low-latency access. To better understand the present embodiment, refer to FIG. 2 , which is a schematic diagram of a separated persistent memory architecture cluster, illustrating the connection relationship between the server and the memory pool. The memory pool is distributed with many storage modules 1 for storing data, and also includes a processor 2, and is also provided with a communication component 3 for RDMA network communication. The server also has a storage module 1, multiple processors 2, and a component 3 for RDMA network communication. RDMA communication can be performed between multiple servers, and multiple servers can also perform RDMA communication with the memory pool.

步骤S20，调用RDMA网络，根据所述路径信息在所述元数据管理系统的内存池中查找所述待访问节点的内存地址；Step S20, calling the RDMA network, and searching the memory address of the node to be accessed in the memory pool of the metadata management system according to the path information;

需要说明的是，内存地址可以为服务端提供访问待访问节点的访问元数据的依据，由于服务端和内存池之间建立了RDMA连接，因此服务端可以通过RDMA网络直接访问内存池。示例性的，通过RDMA网络，服务端可以向内存池发送访问内存请求，访问内存请求包括内存池的内存访问信息，内存访问信息包括服务端存储的内存池的多个内存区域的内存地址以及各内存区域各自的内存密钥，以供所述内存池基于所述访问内存请求中确定出内存访问信息，并判断内存池中是否存在与内存访问信息相匹配的目标内存区域，若内存池中存在与内存访问信息相匹配的目标内存区域，则确定对所述服务端开放内存池的目标内存区域的访问权限，说明内存池的存在内存访问信息的内存地址对应的内存区域，且服务端发送的内存密钥也与内存池的内存区域的内存密钥相同，若内存池中不存在与内存访问信息相匹配的目标内存区域，则关闭服务端对所述内存池的访问权限；可以有多个与内存访问信息相匹配的目标内存区域。在所述内存池对所述服务端开放访问权限的情况下，调用远程直接内存访问网络，服务端对路径信息进行逐级路径哈希，以在内存池中查找待访问节点的内存地址。路径信息中排序最前的节点是根目录节点，排序在最后的是待访问节点，服务端在对路径信息进行逐级解析时，是从根目录节点开始进行路径节点的逐级解析，以在内存池中定位待访问节点的内存地址。其中，逐级路径哈希为：对于路径信息中的各路径节点，根据路径节点的节点名称和路径节点的父节点的哈希值，计算路径节点的节点哈希值，以逐个通过各路径节点的节点哈希值查找待访问节点的内存地址。其中，在路径信息中排序在最前的路径节点为根目录节点，根目录节点没有父节点，因此根目录节点的节点哈希值是提前预置的，可以为0。当路径信息为0/A/E/I时，A的父节点为根目录节点0，E的父节点为路径节点A，I的父节点为路径节点E，也即在路径信息中，在前的路径节点为在后路径节点的父节点。It should be noted that the memory address can provide the server with a basis for accessing the access metadata of the node to be accessed. Since an RDMA connection is established between the server and the memory pool, the server can directly access the memory pool through the RDMA network. Exemplarily, through the RDMA network, the server can send a memory access request to the memory pool, and the memory access request includes memory access information of the memory pool. The memory access information includes the memory addresses of multiple memory areas of the memory pool stored by the server and the memory keys of each memory area, so that the memory pool can determine the memory access information based on the memory access request, and determine whether there is a target memory area matching the memory access information in the memory pool. If there is a target memory area matching the memory access information in the memory pool, then determine the access right to the target memory area of the memory pool to open to the server, indicating that there is a memory area corresponding to the memory address of the memory access information in the memory pool, and the memory key sent by the server is also the same as the memory key of the memory area of the memory pool. If there is no target memory area matching the memory access information in the memory pool, then close the server's access right to the memory pool; there can be multiple target memory areas matching the memory access information. In the case where the memory pool opens access rights to the server, the remote direct memory access network is called, and the server performs step-by-step path hashing on the path information to find the memory address of the node to be accessed in the memory pool. The node that is ranked first in the path information is the root directory node, and the node that is ranked last is the node to be accessed. When the server parses the path information step by step, it starts from the root directory node and performs step-by-step parsing of the path nodes to locate the memory address of the node to be accessed in the memory pool. Among them, the step-by-step path hashing is: for each path node in the path information, the node hash value of the path node is calculated according to the node name of the path node and the hash value of the parent node of the path node, so as to find the memory address of the node to be accessed one by one through the node hash value of each path node. Among them, the path node that is ranked first in the path information is the root directory node, and the root directory node has no parent node, so the node hash value of the root directory node is preset in advance and can be 0. When the path information is 0/A/E/I, the parent node of A is the root directory node 0, the parent node of E is the path node A, and the parent node of I is the path node E. That is, in the path information, the previous path node is the parent node of the subsequent path node.

进一步的，在一可行实施例中步骤S20还包括步骤S21～步骤S26：Furthermore, in a feasible embodiment, step S20 further includes steps S21 to S26:

步骤S21，从所述路径信息中确定出根目录节点，调用所述远程直接内存访问网络，在所述内存池中读取所述根目录节点所在的根元数据桶；Step S21, determining a root directory node from the path information, calling the remote direct memory access network, and reading a root metadata bucket where the root directory node is located in the memory pool;

需要说明的是，路径信息中排序在最前的为根目录节点，服务端直接读取所述根元数据桶(根MDB)，从而可以在服务端直接在根元数据桶中进行待访问节点的内存地址的查找。由于服务端和内存池可以进行远程直接内存访问网络的通信，因此，服务端可以调用RDMA Read操作，从内存池中读取根目录所在的根元数据桶。其中，RDMA Read操作是一种单边操作，意味着该操作主要由服务端执行，而内存池不需要执行任何操作。示例性的，服务端从所述路径信息中确定出根目录节点，服务端调用RDMA Read操作，从内存池中读取根目录所在的根元数据桶。It should be noted that the root directory node is sorted first in the path information, and the server directly reads the root metadata bucket (root MDB), so that the memory address of the node to be accessed can be directly searched in the root metadata bucket on the server. Since the server and the memory pool can communicate over a remote direct memory access network, the server can call the RDMA Read operation to read the root metadata bucket where the root directory is located from the memory pool. Among them, the RDMA Read operation is a unilateral operation, which means that the operation is mainly performed by the server, and the memory pool does not need to perform any operation. Exemplarily, the server determines the root directory node from the path information, and the server calls the RDMA Read operation to read the root metadata bucket where the root directory is located from the memory pool.

步骤S22，将所述根目录节点作为目标节点，在所述根元数据桶中查找与所述目标节点的目标哈希值相匹配的目标哈希桶，在所述目标哈希桶中查找所述目标节点的目标元数据；Step S22, taking the root directory node as the target node, searching the root metadata bucket for a target hash bucket that matches the target hash value of the target node, and searching the target metadata of the target node in the target hash bucket;

步骤S23，在所述目标元数据的访问权限对所述客户端开放的情况下，在所述路径信息中获取所述目标节点的次级路径节点，根据所述目标哈希值和所述次级路径节点的次级名称，计算所述次级路径节点的次级哈希值；Step S23, when the access right of the target metadata is open to the client, obtaining the secondary path node of the target node in the path information, and calculating the secondary hash value of the secondary path node according to the target hash value and the secondary name of the secondary path node;

需要说明的是，目标哈希桶为存储目标节点的目标元数据的结构，目标元数据为目标节点对应的元数据，次级路径节点为路径信息中目标节点的下一次序的节点，次级名称为次级路径节点的名称，次级哈希值用于定位次级路径节点在根元数据桶的存储位置。It should be noted that the target hash bucket is a structure for storing the target metadata of the target node, the target metadata is the metadata corresponding to the target node, the secondary path node is the node in the next order of the target node in the path information, the secondary name is the name of the secondary path node, and the secondary hash value is used to locate the storage position of the secondary path node in the root metadata bucket.

步骤S24，在所述根元数据桶中查找与所述次级哈希值相匹配的次级哈希桶，若所述次级哈希桶中存在所述次级路径节点的节点元数据，则判断所述次级路径节点是否为待访问节点；Step S24, searching the root metadata bucket for a secondary hash bucket that matches the secondary hash value, and if the secondary hash bucket contains node metadata of the secondary path node, determining whether the secondary path node is a node to be accessed;

需要说明的是，次级哈希桶为存储次级路径节点的次级元数据的结构，节点元数据为次级路径节点对应的元数据，次级路径节点为路径信息中目标节点的下一次序的节点，次级名称为次级路径节点的名称，次级哈希值用于定位次级路径节点在根元数据桶的存储位置。It should be noted that the secondary hash bucket is a structure for storing secondary metadata of secondary path nodes. The node metadata is the metadata corresponding to the secondary path node. The secondary path node is the node in the next order of the target node in the path information. The secondary name is the name of the secondary path node. The secondary hash value is used to locate the storage position of the secondary path node in the root metadata bucket.

步骤S25，若是，则根据所述次级哈希值，确定所述内存地址。Step S25: If yes, determine the memory address according to the secondary hash value.

在本实施例中是逐级确定路径信息中的各路径节点是否存在对应的哈希桶，哈希桶包括目标哈希桶和次级哈希桶，并且逐个判断各路径节点对应的哈希桶中是否存在路径节点各自对应的元数据。也即在路径信息中，当排序在前的路径节点存在哈希桶，且哈希桶存在该路径节点对应的元数据，且该元数据的访问权限对客户端开放的情况下，才会计算下一路径节点的节点哈希值，以判断下一路径节点是否存在对应的元数据。在本实施例中，若所述次级路径节点不是待访问节点，则将所述目标节点更新为次级路径节点，将所述目标哈希值更新为次级哈希值，并返回执行所述在所述目标元数据的访问权限对所述客户端开放的情况下，在所述路径信息中获取所述目标节点的次级路径节点的步骤。In this embodiment, it is determined step by step whether there is a corresponding hash bucket for each path node in the path information. The hash bucket includes a target hash bucket and a secondary hash bucket, and it is determined one by one whether there is metadata corresponding to each path node in the hash bucket corresponding to each path node. That is, in the path information, when there is a hash bucket for the path node in the front order, and the hash bucket contains metadata corresponding to the path node, and the access rights of the metadata are open to the client, the node hash value of the next path node will be calculated to determine whether there is corresponding metadata for the next path node. In this embodiment, if the secondary path node is not the node to be visited, the target node is updated to a secondary path node, the target hash value is updated to a secondary hash value, and the step of obtaining the secondary path node of the target node in the path information when the access rights of the target metadata are open to the client is returned.

示例性的，步骤S22～步骤S25包括：将所述根目录节点作为目标节点，在所述根元数据桶中查找与所述目标节点的目标哈希值相匹配的目标哈希桶，在所述目标哈希桶中存在所述目标节点的目标元数据，当目标节点为根目录节点时，可以在根元数据桶中查找到根目录节点对应的元数据。在所述目标元数据的访问权限对所述客户端开放的情况下，在所述路径信息中确定所述目标节点的次级路径节点，根据所述目标哈希值和所述次级路径节点的次级名称，计算所述次级路径节点的次级哈希值；在所述根元数据桶中查找与所述次级哈希值相匹配的次级哈希桶，若所述次级哈希桶中存在所述次级路径节点的节点元数据，则判断所述次级路径节点是否为待访问节点；若所述次级路径节点不是待访问节点，则将所述目标节点更新为次级路径节点，将所述目标哈希值更新为次级哈希值，并返回执行步骤S23；若所述次级路径节点是待访问节点，则将所述次级哈希值查找到待访问节点的内存地址，地址与哈希值之间存在映射关系，能通过次级哈希值查找到对应的次级哈希桶，进而便于找到对应的元数据。其中，判断目标元数据的访问权限是否对客户端开放的步骤可以为：可以根据访问请求中携带的客户端的客户标识，以及目标元数据预置的权限查看表单，判断目标元数据的访问是否对客户端开放，若在目标元数据的权限查看表单中客户标识为可查看类型，则确定目标元数据的访问权限对该客户端开放，若在目标元数据的权限查看表单中客户标识为不可查看类型，则确定目标元数据的访问权限不对该客户端开放，并停止访问待访问节点的内存地址。Exemplarily, steps S22 to S25 include: taking the root directory node as the target node, searching the root metadata bucket for a target hash bucket that matches the target hash value of the target node, the target hash bucket containing the target metadata of the target node, and when the target node is a root directory node, the metadata corresponding to the root directory node can be found in the root metadata bucket. In the case where the access rights of the target metadata are open to the client, the secondary path node of the target node is determined in the path information, and the secondary hash value of the secondary path node is calculated according to the target hash value and the secondary name of the secondary path node; the secondary hash bucket matching the secondary hash value is searched in the root metadata bucket, and if the node metadata of the secondary path node exists in the secondary hash bucket, it is determined whether the secondary path node is a node to be visited; if the secondary path node is not a node to be visited, the target node is updated to a secondary path node, the target hash value is updated to a secondary hash value, and the process returns to step S23; if the secondary path node is a node to be visited, the secondary hash value is searched for the memory address of the node to be visited, and there is a mapping relationship between the address and the hash value, so the corresponding secondary hash bucket can be found through the secondary hash value, thereby facilitating the finding of the corresponding metadata. Among them, the step of determining whether the access rights of the target metadata are open to the client may be as follows: determining whether the access rights of the target metadata are open to the client based on the client identifier of the client carried in the access request and the permission viewing form preset in the target metadata; if the client identifier in the permission viewing form of the target metadata is a viewable type, determining that the access rights of the target metadata are open to the client; if the client identifier in the permission viewing form of the target metadata is a non-viewable type, determining that the access rights of the target metadata are not open to the client, and stopping access to the memory address of the node to be accessed.

在一可行实施例中，在步骤S24中在所述目标元数据的访问权限对所述客户端开放的情况下的步骤之后，还包括步骤S241～步骤S243：In a feasible embodiment, after the step in step S24 in which the access right of the target metadata is open to the client, steps S241 to S243 are further included:

步骤S241，若所述次级哈希桶中不存在所述次级路径节点的节点元数据，则判断所述次级路径节点的父节点是否存在扩展元数据桶；Step S241, if the node metadata of the secondary path node does not exist in the secondary hash bucket, determine whether the parent node of the secondary path node has an extended metadata bucket;

步骤S242，在所述父节点存在扩展元数据桶的情况下，通过远程直接内存访问网络在所述内存池中读取所述扩展元数据桶，并在所述扩展元数据桶中查找所述次级路径节点的扩展哈希桶；Step S242, when the parent node has an extended metadata bucket, read the extended metadata bucket in the memory pool through the remote direct memory access network, and search for the extended hash bucket of the secondary path node in the extended metadata bucket;

步骤S243，若所述扩展哈希桶中存在所述次级路径节点的节点元数据，则在所述次级路径节点为待访问节点的情况下，根据所述次级哈希值，确定所述内存地址。Step S243, if the node metadata of the secondary path node exists in the extended hash bucket, then when the secondary path node is a node to be accessed, the memory address is determined according to the secondary hash value.

需要说明的是，扩展元数据桶用于存储父节点的扩展类型的元数据。在本实施例中次级路径节点的父节点为目标节点，也即可以判断目标节点是否存在扩展元数据桶，扩展哈希桶为在扩展元数据桶中次级路径节点的节点元数据的结构。在本实施例中，若所述扩展元数据桶中存在所述扩展哈希桶，且在所述扩展哈希桶中不存在所述次级路径节点的节点元数据，则将所述次级路径节点更新为所述扩展哈希桶所在扩展节点，并返回执行所述判断所述次级路径节点的父节点是否存在扩展元数据桶的步骤。示例性的，步骤S241至步骤S243包括：若所述次级哈希桶中不存在所述次级路径节点的节点元数据，则判断所述次级路径节点的父节点是否存在扩展元数据桶；若所述次级路径节点的父节点不存在扩展元数据桶，则确定在内存池中访问不到待访问节点，结束对待访问节点的访问，并可以反馈给客户端，以提示无法在内存池中访问获取待访问节点的节点元数据。在所述父节点存在扩展元数据桶的情况下，服务端通过RDMA Read操作，在所述内存池中读取所述扩展元数据桶，并在所述扩展元数据桶中查找所述次级路径节点的扩展哈希桶；若所述扩展元数据桶中存在所述扩展哈希桶，且所述扩展哈希桶中存在所述次级路径节点的节点元数据，则执行步骤S24中的判断所述次级路径节点是否为待访问节点；若所述扩展元数据桶中存在所述扩展哈希桶，且在所述扩展哈希桶中不存在所述次级路径节点的节点元数据，则将所述次级路径节点更新为所述扩展哈希桶所在扩展节点，并返回执行所述判断所述次级路径节点的父节点是否存在扩展元数据桶的步骤。It should be noted that the extended metadata bucket is used to store the metadata of the extended type of the parent node. In this embodiment, the parent node of the secondary path node is the target node, that is, it can be determined whether the target node has an extended metadata bucket, and the extended hash bucket is the structure of the node metadata of the secondary path node in the extended metadata bucket. In this embodiment, if the extended hash bucket exists in the extended metadata bucket, and the node metadata of the secondary path node does not exist in the extended hash bucket, the secondary path node is updated to the extended node where the extended hash bucket is located, and the step of determining whether the parent node of the secondary path node has an extended metadata bucket is returned to execute. Exemplarily, steps S241 to S243 include: if the node metadata of the secondary path node does not exist in the secondary hash bucket, it is determined whether the parent node of the secondary path node has an extended metadata bucket; if the parent node of the secondary path node does not have an extended metadata bucket, it is determined that the node to be accessed cannot be accessed in the memory pool, and the access to the node to be accessed is terminated, and feedback can be given to the client to prompt that the node metadata of the node to be accessed cannot be accessed in the memory pool. In the case where the parent node has an extended metadata bucket, the server reads the extended metadata bucket in the memory pool through an RDMA Read operation, and searches for the extended hash bucket of the secondary path node in the extended metadata bucket; if the extended hash bucket exists in the extended metadata bucket, and the node metadata of the secondary path node exists in the extended hash bucket, then the step of determining whether the secondary path node is a node to be accessed in step S24 is executed; if the extended hash bucket exists in the extended metadata bucket, and the node metadata of the secondary path node does not exist in the extended hash bucket, then the secondary path node is updated to the extended node where the extended hash bucket is located, and the step of determining whether the parent node of the secondary path node has an extended metadata bucket is returned to execute.

结合步骤S21～步骤S25和步骤S241～步骤S243可知，在逐级对路径信息中的路径节点进行逐级路径哈希以逐个在内存池中查找是否有路径节点对应的元数据，若存在任一未查找到对应元数据的路径节点，则判断所述路径节点的父节点是否存在扩展元数据桶；在所述父节点存在扩展元数据桶的情况下，通过远程直接内存访问网络在所述内存池中读取所述扩展元数据桶，以在扩展元数据桶中查找该路径节点对应的元数据，若路径节点的父节点不存在扩展元数据桶，则停止访问待访问节点。In combination with steps S21 to S25 and steps S241 to S243, it can be known that the path nodes in the path information are hashed step by step to search one by one in the memory pool for metadata corresponding to the path nodes. If there is any path node for which the corresponding metadata is not found, it is determined whether the parent node of the path node has an extended metadata bucket. If the parent node has an extended metadata bucket, the extended metadata bucket is read in the memory pool through a remote direct memory access network to search for metadata corresponding to the path node in the extended metadata bucket. If the parent node of the path node does not have an extended metadata bucket, access to the node to be accessed is stopped.

本实施例充分利用了远程直接内存访问网络以绕过内存节点的处理器，直接读写内存池中内存的单边读写能力。传统的分布式文件系统集群中的各个节点大多使用远程过程调用(Remote Procedure Call，RPC)进行通信，将请求发送给对方，交由对方处理完成将数据返回。但由于分离式持久性内存架构中，内存池所在的内存结点仅有少量的计算资源，无法承载大量复杂例如路径解析等计算过程，倘若计算节点上的服务端直接将元数据请求发送给内存节点上的内存池，让内存池来执行复杂的计算过程，势必会造成内存节点的处理器不堪重负。所以本申请实施例，由服务端来完成路径解析等的计算操作，且所有读写过程均使用了RDMA单边读写操作来完成，也即绕过了内存池中内存节点的处理器，提高了数据读写的速度。为更好理解本实施例，参照图3，图3为目录树在内存池的存储示意图，对路径信息进行路径解析的流程进行说明。首先对图3中的内容进行说明，图3中包括目录树，目录树中包括/、A、B、C、D、E、F、J、H以及I的节点，其中/表示为根目录节点，在图3中用不同的显示形式，分别表示目录类型的元数据、目录扩展类型的元数据，以及文件类型的元数据，图3中MDB0为根元数据桶，MDB1为扩展元数据桶。在MDB0和MDB1中，Hash Bucket的总数为11，Hash Bucket的序号从左至右为0～10号，例如，在MDB0中根目录节点“/”的根节点元数据在MDB0中的0号Hash Bucket中，A的节点元数据在4号Hash Bucket中。在MDB0中“/、A、F、B、C”对应目录类型的元数据，“J、G”对应文件类型的元数据，MDB1中“E、D”对应目录类型的元数据，“I、H”对应文件类型的元数据；在MDB1中的A对应目录扩展类型的元数据。其次，参照图3对路径解析的路程进行说明：当待访问节点为I，需要解析的路径信息为：根目录节点/A/E/I；服务端找到根目录节点所在的根元数据桶，为MDB0，服务端根据根目录节点的哈希值(固定为0)在MDB0在MDB 0的0号Hash Bucket(哈希桶)中找到根目录节点对应的根节点元数据，并检查根节点元数据是否对发起访问请求的客户端存在开放权限，在存在开放权限的情况下，服务端以根目录节点的哈希值和下一路径节点A的名称作为输入，计算得到路径节点A的节点哈希值为4，并在MDB0中查找在4号Hash Bucket是否存在路径节点A的节点元数据；并在路径节点A的节点元数据对客户端开放的情况下，服务端以路径节点A的节点哈希值4和下一路径节点E的节点名称，计算得到路径节点E的节点哈希值为6，在MDB0中查找6号Hash Bucket是否存在路径节点E的节点元数据；在MDB 0中的6号Hash Bucket中查找路径节点E的节点元数据失败，服务端从内存池通过RDMA Read读回路径节点A的扩展元数据桶，即MDB 1；服务端根据路径节点E的节点哈希值6，在MDB 1的6号Hash Bucket中查找得到路径节点E的元数据；并在路径节点E的节点元数据对客户端开放的情况下，服务端以路径节点E的节点哈希值6和下一路径节点I的名称，计算得到路径节点I的哈希值为1，并在MDB1的1号Hash Bucket查找路径节点I的节点元数据。在访问请求的类型为读请求的情况下，向客户端返回路径节点I的节点元数据。This embodiment makes full use of the remote direct memory access network to bypass the processor of the memory node and directly read and write the unilateral read and write capabilities of the memory in the memory pool. Most of the nodes in the traditional distributed file system cluster use remote procedure calls (RPC) to communicate, send requests to each other, and return the data after the other party completes the processing. However, due to the separated persistent memory architecture, the memory node where the memory pool is located has only a small amount of computing resources and cannot carry a large number of complex computing processes such as path resolution. If the server on the computing node directly sends the metadata request to the memory pool on the memory node, and the memory pool is allowed to perform complex computing processes, it is bound to cause the processor of the memory node to be overwhelmed. Therefore, in the embodiment of the present application, the server completes the computing operations such as path resolution, and all read and write processes are completed using RDMA unilateral read and write operations, that is, bypassing the processor of the memory node in the memory pool, which improves the speed of data reading and writing. For a better understanding of this embodiment, refer to Figure 3, which is a schematic diagram of the storage of the directory tree in the memory pool, and the process of path resolution of path information is explained. First, the content in FIG. 3 is explained. FIG. 3 includes a directory tree, and the directory tree includes nodes /, A, B, C, D, E, F, J, H, and I, where / represents the root directory node. In FIG. 3, different display forms are used to represent metadata of directory type, metadata of directory extension type, and metadata of file type, respectively. In FIG. 3, MDB0 is the root metadata bucket, and MDB1 is the extension metadata bucket. In MDB0 and MDB1, the total number of Hash Buckets is 11, and the sequence numbers of Hash Buckets are 0 to 10 from left to right. For example, in MDB0, the root node metadata of the root directory node "/" is in Hash Bucket No. 0 in MDB0, and the node metadata of A is in Hash Bucket No. 4. In MDB0, "/, A, F, B, C" correspond to metadata of directory type, "J, G" correspond to metadata of file type, in MDB1, "E, D" correspond to metadata of directory type, and "I, H" correspond to metadata of file type; A in MDB1 corresponds to metadata of directory extension type. Secondly, the path resolution process is described with reference to FIG3 : when the node to be accessed is I, the path information to be resolved is: root directory node/A/E/I; the server finds the root metadata bucket where the root directory node is located, which is MDB0, and the server finds the root node metadata corresponding to the root directory node in Hash Bucket No. 0 of MDB0 in MDB 0 according to the hash value of the root directory node (fixed as 0), and checks whether the root node metadata has open permissions to the client that initiates the access request. If there is open permissions, the server uses the hash value of the root directory node and the name of the next path node A as input, calculates the node hash value of path node A to be 4, and searches MDB0 for node metadata of path node A in Hash Bucket No. 4; and if the node metadata of path node A is open to the client, the server uses the node hash value of path node A to be 4 and the node name of the next path node E, calculates the node hash value of path node E to be 6, and searches MDB0 for node metadata of path node E in Hash Bucket No. 6; Bucket failed to search for the node metadata of path node E, and the server read back the extended metadata bucket of path node A, that is, MDB 1, from the memory pool through RDMA Read; the server searches for the metadata of path node E in Hash Bucket No. 6 of MDB 1 based on the node hash value 6 of path node E; and when the node metadata of path node E is open to the client, the server calculates the hash value of path node I as 1 based on the node hash value 6 of path node E and the name of the next path node I, and searches for the node metadata of path node I in Hash Bucket No. 1 of MDB1. When the access request type is a read request, the node metadata of path node I is returned to the client.

步骤S30，通过所述内存地址，获取所述待访问节点的访问元数据并对所述访问元数据进行数据操作。Step S30: obtaining access metadata of the node to be accessed through the memory address and performing data operations on the access metadata.

需要说明的是，在确定了待访问节点对应的内存地址后，可以通过该内存地址，读取到待访问节点的访问元数据，以便利用读取到的访问元数据对访问元数据进行数据操作，数据操作包括写操作，还包括读操作，写操作可以包括新建操作，还可以包括修改操作等，新建操作可以是新建目录元数据、新建文件元数据等。示例性的，从所述内存地址中读取所述待访问节点的访问元数据，并对访问元数据进行读操作或写操作，具体可以基于客户端发送的访问请求确定。访问请求的类型可以是写请求，也可以是读请求。It should be noted that after determining the memory address corresponding to the node to be accessed, the access metadata of the node to be accessed can be read through the memory address, so as to use the read access metadata to perform data operations on the access metadata. Data operations include write operations and read operations. Write operations can include new operations and modify operations, etc. New operations can be new directory metadata, new file metadata, etc. Exemplarily, the access metadata of the node to be accessed is read from the memory address, and a read operation or a write operation is performed on the access metadata, which can be specifically determined based on the access request sent by the client. The type of access request can be a write request or a read request.

进一步地，参照图4，基于本申请上述实施例，在本申请另一实施例中，与上述实施例相同或相似的内容，可以参考上文介绍，后续不再赘述。在此基础上，请参照图4，步骤S30还包括步骤S31～步骤S33：Further, referring to FIG4 , based on the above embodiment of the present application, in another embodiment of the present application, the same or similar contents as the above embodiment can be referred to the above introduction, and will not be repeated later. On this basis, referring to FIG4 , step S30 also includes steps S31 to S33:

步骤S31，在所述访问请求的类型为写请求的情况下，调用远程直接内存访问替换操作，判断访问元数据所在的访问元数据桶的锁区域是否为空；Step S31, when the type of the access request is a write request, calling a remote direct memory access replacement operation to determine whether the lock area of the access metadata bucket where the access metadata is located is empty;

需要说明的是，访问请求的类型为写请求时，就会对访问元数据进行写操作，访问元数据桶为存储访问元数据的存储结构，每个MDB都有锁区域，访问元数据桶也有对应的锁区域，当访问元数据桶的锁区域为空时说明访问元数据桶当前没有服务端负责对访问元数据桶中的元数据进行写操作，当访问元数据桶的锁区域不为空时说明访问元数据桶当前有服务端负责对访问元数据桶中的元数据进行写操作。远程直接内存访问替换操作为RDMACAS操作。示例性的，步骤S31包括：在所述访问请求的类型为写请求的情况下，获取访问元数据所在的访问元数据桶的锁区域并判断锁区域是否为空。在本实施例中访问请求的类型还可以为读请求，步骤S30还可以包括步骤S34：在访问请求的类型为读请求的情况下，将读取到的访问元数据返回给客户端。It should be noted that when the type of access request is a write request, a write operation will be performed on the access metadata. The access metadata bucket is a storage structure for storing access metadata. Each MDB has a lock area, and the access metadata bucket also has a corresponding lock area. When the lock area of the access metadata bucket is empty, it means that there is currently no server in the access metadata bucket responsible for writing the metadata in the access metadata bucket. When the lock area of the access metadata bucket is not empty, it means that there is currently a server in the access metadata bucket responsible for writing the metadata in the access metadata bucket. The remote direct memory access replacement operation is an RDMACAS operation. Exemplarily, step S31 includes: when the type of the access request is a write request, obtaining the lock area of the access metadata bucket where the access metadata is located and determining whether the lock area is empty. In this embodiment, the type of the access request can also be a read request, and step S30 can also include step S34: when the type of the access request is a read request, returning the read access metadata to the client.

步骤S32，若为空，则确定所述服务端为修改主节点，则确定所述服务端为修改主节点，并通过远程直接内存访问替换操作，在所述访问元数据桶的锁区域写入所述修改主节点的主配置信息，获得在所述访问元数据桶中的写开放权限，并在所述访问元数据桶中对所述访问元数据执行写操作；Step S32: if it is empty, the server is determined to be the modification master node, and the server is determined to be the modification master node, and the master configuration information of the modification master node is written into the lock area of the access metadata bucket through a remote direct memory access replacement operation, and the write open permission in the access metadata bucket is obtained, and a write operation is performed on the access metadata in the access metadata bucket;

在本实施例中，若存在多个服务端对同一访问元数据桶进行写操作，则需要在多个服务端之间确定修改主节点和修改从节点，访问元数据桶的写权限对修改主节点开放，可以理解的，修改主节点可以对访问元数据桶中的元数据进行写操作，还用于接收修改从节点发送的期望修改的元数据，并在访问元数据桶中对修改从节点期望修改的元数据进行写操作，访问元数据桶的写权限不直接对修改从节点开放，修改从节点是通过向修改主节点发送期望在访问元数据桶中期望修改的待修改元数据，由修改主节点在访问元数据桶中对期望修改的待修改元数据进行修改。主配置信息为修改主节点要在访问元数据桶进行写操作时的配置信息，主配置信息包括节点配置信息和远程直接内存访问配置信息。示例性的，步骤S32包括，若锁区域为空，则确定所述服务端为修改主节点，并调用远程直接内存访问替换操作，在所述访问元数据桶的锁区域写入所述修改主节点的主配置信息，获得在所述访问元数据桶中的写开放权限，并在所述访问元数据桶中对所述访问元数据执行写操作。In this embodiment, if there are multiple servers that perform write operations on the same access metadata bucket, it is necessary to determine the modification master node and the modification slave node among the multiple servers. The write permission of the access metadata bucket is open to the modification master node. It can be understood that the modification master node can perform write operations on the metadata in the access metadata bucket, and is also used to receive the metadata expected to be modified sent by the modification slave node, and write the metadata expected to be modified by the modification slave node in the access metadata bucket. The write permission of the access metadata bucket is not directly open to the modification slave node. The modification slave node sends the metadata expected to be modified in the access metadata bucket to the modification master node, and the modification master node modifies the metadata expected to be modified in the access metadata bucket. The main configuration information is the configuration information that the modification master node needs to have when performing a write operation on the access metadata bucket. The main configuration information includes node configuration information and remote direct memory access configuration information. Exemplarily, step S32 includes, if the lock area is empty, determining that the server is a modified master node, and calling a remote direct memory access replacement operation, writing the main configuration information of the modified master node in the lock area of the access metadata bucket, obtaining write open permissions in the access metadata bucket, and performing a write operation on the access metadata in the access metadata bucket.

在一可行的实施方式中，步骤S32还包括步骤S321～步骤S324：In a feasible implementation manner, step S32 further includes steps S321 to S324:

步骤S321，通过所述远程直接内存访问替换操作，在所述访问元数据桶的锁区域写入所述修改主节点的节点配置信息，并在所述修改主节点中创建日志空间，并确定日志空间的远程直接内存访问配置信息；Step S321, writing the node configuration information of the modified master node into the lock area of the access metadata bucket through the remote direct memory access replacement operation, creating a log space in the modified master node, and determining the remote direct memory access configuration information of the log space;

需要说明的是，远程直接内存访问替换操作为RDMA网络中的RDMA CAS(Compare-and-Swap，比较并替换)操作，RDMA CAS操作用于在MDB的锁区域进行数据写入等的操作，节点配置信息包括服务端自身节点的相关信息，例如修改主节点的标识，修改主节点的网络地址和开放端口，开放端口用于向修改从节点提供访问修改主节点的入口。日志空间用于记录修改访问元数据桶中的元数据的数据操作，还用于存储并接收修改从节点发送的期望修改的待修改元数据等。日志空间包括索引区域、日志区和结果区，索引区域用于存放一个索引增值，索引增值关联一个日志区和一个结果区，日志空间可以有多个索引增值，一个索引增值对应唯一一个修改从节点，可以理解的修改主节点可以接收多个修改从节点发送的期望修改的待修改元数据。远程直接内存访问配置信息包括日志空间的地址以及日志空间的日志密钥，日志密钥为配置在日志空间的RDMA MR(Memory Region，内存区域)的RKey(远程钥)。It should be noted that the remote direct memory access replacement operation is an RDMA CAS (Compare-and-Swap) operation in the RDMA network. The RDMA CAS operation is used to write data in the lock area of the MDB. The node configuration information includes relevant information of the server's own node, such as modifying the identity of the master node, modifying the network address and open port of the master node, and the open port is used to provide the modification slave node with an entry to access the modification master node. The log space is used to record the data operation of modifying the metadata in the metadata bucket, and is also used to store and receive the metadata to be modified that is expected to be modified sent by the modification slave node. The log space includes an index area, a log area, and a result area. The index area is used to store an index increment. The index increment is associated with a log area and a result area. The log space can have multiple index increments. One index increment corresponds to a unique modification slave node. It can be understood that the modification master node can receive the metadata to be modified that is expected to be modified sent by multiple modification slave nodes. The remote direct memory access configuration information includes the address of the log space and the log key of the log space. The log key is the RKey (remote key) of the RDMA MR (Memory Region) configured in the log space.

步骤S322，将所述日志空间的远程直接内存访问配置信息注册到所述修改主节点上，并将远程直接内存访问配置信息写入所述锁区域中，获得所述访问元数据桶的写开放权限；Step S322, registering the remote direct memory access configuration information of the log space to the modification master node, and writing the remote direct memory access configuration information into the lock area, and obtaining the write-open permission for accessing the metadata bucket;

需要说明的是，将远程直接内存访问配置信息和节点配置信息都写入到锁区域中，从而可以获取在访问元数据桶中的写开放权限。将日志空间的地址和日志空间的日志密钥以及节点配置信息写在锁区域，以便修改从节点可以从锁区域中获取修改主节点的网络地址，以通过网络地址成功访问修改主节点，还通过获取修改主节点的日志空间地址和日志密钥，从而可以成功访问修改主节点的日志空间。本实施例中为将日志空间的远程直接内存访问配置信息注册到所述修改主节点上，从而使得修改从节点可以通过远程直接内存访问网络直接访问修改主节点的日志空间。提高服务端之间的访问速度。It should be noted that the remote direct memory access configuration information and the node configuration information are written into the lock area, so that the write permission in the access metadata bucket can be obtained. The address of the log space, the log key of the log space, and the node configuration information are written in the lock area, so that the modified slave node can obtain the network address of the modified master node from the lock area, so as to successfully access the modified master node through the network address, and also obtain the log space address and log key of the modified master node, so as to successfully access the log space of the modified master node. In this embodiment, the remote direct memory access configuration information of the log space is registered to the modified master node, so that the modified slave node can directly access the log space of the modified master node through the remote direct memory access network. Improve the access speed between servers.

步骤S323，在获得在所述访问元数据桶中的写开放权限的情况下，对访问元数据执行写操作。Step S323: When the write permission in the access metadata bucket is obtained, a write operation is performed on the access metadata.

需要说明的是，当服务端成为修改主节点后，可以直接对访问元数据执行写操作，并将对访问元数据执行完写操作后的结果直接反馈给客户端。示例性，调用所述远程直接内存访问替换操作，在所述访问元数据桶的锁区域写入所述修改主节点的节点配置信息，并在所述修改主节点中创建日志空间，并确定日志空间的远程直接内存访问配置信息；将所述日志空间的远程直接内存访问配置信息注册到所述修改主节点的网卡上，并将远程直接内存访问配置信息写入所述锁区域中，获得所述访问元数据桶的写开放权限；在获得在所述访问元数据桶中的写开放权限的情况下，对访问元数据执行写操作，在完成对访问元数据的写操作后，可以直接向客户端反馈对访问元数据的写操作的结果，修改主节点可以对多个访问元数据进行写操作。It should be noted that, when the server becomes the modification master node, it can directly perform write operations on the access metadata, and directly feed back the results of the write operations on the access metadata to the client. Exemplarily, the remote direct memory access replacement operation is called, the node configuration information of the modification master node is written in the lock area of the access metadata bucket, and a log space is created in the modification master node, and the remote direct memory access configuration information of the log space is determined; the remote direct memory access configuration information of the log space is registered on the network card of the modification master node, and the remote direct memory access configuration information is written in the lock area, and the write permission of the access metadata bucket is obtained; when the write permission in the access metadata bucket is obtained, the access metadata is written. After completing the write operation on the access metadata, the result of the write operation on the access metadata can be directly fed back to the client, and the modification master node can write to multiple access metadata.

在一可行实施例中，在步骤S323的在获得在所述访问元数据桶中的写开放权限的情况下的步骤之后还包括步骤S324～步骤S325：In a feasible embodiment, after the step of obtaining the write open permission in the access metadata bucket in step S323, steps S324 to S325 are also included:

步骤S324，监测所述日志空间的索引区域是否更新，若所述索引区域更新，则从所述索引区域中获取索引增值，从所述索引增值关联的日志区中获取待修改元数据以及所述待修改元数据的修改信息；Step S324, monitoring whether the index area of the log space is updated, if the index area is updated, obtaining an index increment from the index area, obtaining metadata to be modified and modification information of the metadata to be modified from the log area associated with the index increment;

步骤S325，在所述访问元数据桶中根据所述修改信息对待修改元数据进行修改，并在与所述索引增值关联的结果区中更新所述待修改元数据的修改状态。Step S325 , modifying the metadata to be modified in the access metadata bucket according to the modification information, and updating the modification status of the metadata to be modified in the result area associated with the index increment.

需要说明的是，在服务端成为修改主节点的情况下，需要对修改主节点的日志空间的索引区域进行监测，监测索引区域是否更新，索引区域更新说明有修改从节点访问修改主节点的日志空间，且该修改从节点有向修改主节点发送期望修改的待修改元数据的可能，因此，在监测到索引区域更新时，获取索引区域的索引增值，从索引增值关联的日志区中获取待修改元数据和其修改信息，并在访问元数据桶中对待修改元数据进行修改。修改状态用于描述待修改元数据是否完成修改，在待修改元数据完成修改之前修改状态可以是默认的状态，在待修改元数据完成修改后，修改状态可以更新为已完成修改或已成功修改等的状态，本实施例在此不做具体限定。修改信息为待修改元数据携带的期望对待修改元数据进行修改的内容。日志区中可以隔离存放多个待修改元数据，结果区也可以隔离存放多个待修改元数据的修改记录，同一待修改元数据在日志区和结果区的隔离存放次序可以是相同的，在修改主节点对待修改元数据进行修改之前，待修改元数据在结果区对应位置显示的待修改元数据的修改状态是默认的，在修改主节点对待修改元数据成功修改后，会在结果区的对应位置更新待修改元数据对应位置的修改状态，从而便于提醒修改从节点待修改元数据修改成功。It should be noted that, when the server becomes the modification master node, it is necessary to monitor the index area of the log space of the modification master node to monitor whether the index area is updated. The update of the index area indicates that a modification slave node has accessed the log space of the modification master node, and the modification slave node has the possibility of sending the metadata to be modified that is expected to be modified to the modification master node. Therefore, when the index area update is monitored, the index increment of the index area is obtained, the metadata to be modified and its modification information are obtained from the log area associated with the index increment, and the metadata to be modified is modified in the access metadata bucket. The modification status is used to describe whether the metadata to be modified has been modified. Before the metadata to be modified is modified, the modification status can be the default status. After the metadata to be modified is modified, the modification status can be updated to a status such as completed modification or successfully modified. This embodiment does not make specific limitations here. The modification information is the content of the metadata to be modified that is expected to be modified and carried by the metadata to be modified. The log area can store multiple metadata to be modified in isolation, and the result area can also store multiple modification records of the metadata to be modified in isolation. The isolation storage order of the same metadata to be modified in the log area and the result area can be the same. Before the modification master node modifies the metadata to be modified, the modification status of the metadata to be modified displayed at the corresponding position in the result area is the default. After the modification master node successfully modifies the metadata to be modified, the modification status of the corresponding position of the metadata to be modified will be updated at the corresponding position in the result area, so as to remind the modification slave node that the metadata to be modified has been successfully modified.

示例性的，监测所述日志空间的索引区域是否更新，若所述索引区域更新，则从所述索引区域中获取索引增值，从所述索引增值关联的日志区中获取待修改元数据以及所述待修改元数据的修改信息；在所述访问元数据桶中根据所述修改信息对待修改元数据进行修改，并在与所述索引增值关联的结果区中更新所述待修改元数据的修改状态。其中，当索引增值更新时，说明索引区域也更新了。本实施例通过在修改主节点创建可以进行远程直接内存访问网络通信的日志空间，从而使得修改从节点可以直接访问修改主节点，提高了访问速度，无需经过各服务端的处理器。且在日志空间中开辟索引区域，以便修改从节点在索引区域中进行索引自增，并确定修改从节点对应的索引增值，从而便于通索引增值和其关联的日志区和结果区存储修改从节点的待修改元数据，减少各服务端产生的并发冲突。Exemplarily, monitor whether the index area of the log space is updated. If the index area is updated, obtain the index increment from the index area, obtain the metadata to be modified and the modification information of the metadata to be modified from the log area associated with the index increment; modify the metadata to be modified in the access metadata bucket according to the modification information, and update the modification status of the metadata to be modified in the result area associated with the index increment. Wherein, when the index increment is updated, it means that the index area is also updated. This embodiment creates a log space that can perform remote direct memory access network communication in the modification master node, so that the modification slave node can directly access the modification master node, thereby improving the access speed without passing through the processor of each server. And open up an index area in the log space so that the modification slave node can perform index self-increment in the index area, and determine the index increment corresponding to the modification slave node, so as to facilitate the storage of the metadata to be modified of the modification slave node through the index increment and its associated log area and result area, and reduce the concurrency conflicts generated by each server.

步骤S33，若不为空，则确定所述服务端为修改从节点，并从所述访问元数据桶的锁区域中获取主配置信息，根据所述主配置信息与所述修改主节点建立远程直接内存访问网络的通信连接，并向所述修改主节点发送所述访问元数据以通过所述修改主节点对所述访问元数据执行写操作。Step S33, if it is not empty, determines that the server is a modification slave node, and obtains the master configuration information from the lock area of the access metadata bucket, establishes a communication connection of the remote direct memory access network with the modification master node according to the master configuration information, and sends the access metadata to the modification master node to perform a write operation on the access metadata through the modification master node.

需要说明的是，主配置信息为修改主节点写入锁区域的信息，主配置信息用于为修改从节点提供访问修改主节点的依据。修改主节点可以向修改从节点发送访问元数据，访问元数据可以包括待修改元数据，以通过修改主节点对访问元数据进行修改。示例性的，若锁区域不为空，则确定当前访问的访问元数据桶的服务端为修改从节点，修改从节点对访问元数据桶的锁区域上锁失败，此时可以从锁区域中获取修改主节点的主配置信息，根据主配置信息与修改主节点建立远程直接内存访问网络的通信连接，从而可以通远程直接内存访问网络向修改主节点发送访问元数据，以通过修改主节点对访问元数据执行写操作。It should be noted that the master configuration information is the information written into the lock area by the modification master node, and the master configuration information is used to provide the basis for the modification slave node to access the modification master node. The modification master node can send access metadata to the modification slave node, and the access metadata may include metadata to be modified, so as to modify the access metadata by modifying the master node. Exemplarily, if the lock area is not empty, it is determined that the server of the currently accessed access metadata bucket is the modification slave node, and the modification slave node fails to lock the lock area of the access metadata bucket. At this time, the master configuration information of the modification master node can be obtained from the lock area, and a communication connection of a remote direct memory access network is established with the modification master node based on the master configuration information, so that the access metadata can be sent to the modification master node through the remote direct memory access network, so as to perform a write operation on the access metadata through the modification master node.

在一可行实施例中，步骤S33还包括步骤S331～步骤S333：In a feasible embodiment, step S33 further includes steps S331 to S333:

步骤S331，调用远程直接内存访问自增操作，在所述修改主节点的日志空间的索引区域进行索引自增，得到所述修改从节点的索引增值；Step S331, calling a remote direct memory access self-increment operation, performing an index self-increment in the index area of the log space of the modified master node, and obtaining an index increment of the modified slave node;

步骤S332，根据所述索引增值，将待修改元数据写入到与所述索引增值关联的日志区中，以供所述修改主节点对所述日志区的待修改元数据进行修改；Step S332, according to the index increment, writing the metadata to be modified into the log area associated with the index increment, so that the modification master node can modify the metadata to be modified in the log area;

步骤S333，并轮询所索引增值关联的结果区，在检测到所述结果区中待修改元数据的修改状态更新，则向所述修改从节点的客户端反馈待修改元数据的修改结果。Step S333: poll the result area of the indexed value-added association, and when it is detected that the modification status of the metadata to be modified in the result area is updated, the modification result of the metadata to be modified is fed back to the client of the modification slave node.

需要说明的是，远程直接内存访问自增操作为RDMA FAA操作，修改从节点可以通过远程直接内存访问网络进行RDMA FAA(Fetch-and-Add，取并加)操作，以在修改主节点的日志空间的索引区域进行索引自增，并获取索引自增得到的索引增值。索引增值可以是数值。索引增值对应唯一修改从节点，修改从节点可以对应多个索引增值。当索引区域更新后，索引增值也会对应新的日志区和新的结果区，以用于存放索引增值对应的修改从节点的待修改元数据以及待修改元数据的修改状态。修改结果可以是待修改元数据修改后的数据，也可以是待修改元数据的修改状态等。示例性的，修改从节点可以在索引增值关联的日志区中写入期望的待修改元数据，修改从节点可以在日志区写入多个待修改元数据，修改主节点会轮询日志区，检测修改从节点是否有新写入的待修改元数据，修改主节点会对日志区中的待修改元数据进行对应的修改，且修改主节点还会更新待修改元数据对应的结果区中的修改状态。修改从节点会轮询结果区，以检测期望修改的一个或多个待修改元数据是否完成修改，当待修改元数据完成修改，则可以向待修改从节点的客户端反馈待修改元数据的修改结果。本实施例中修改从节点可以通过修改主节点修改对应的待修改元数据，避免了服务端之间的并发冲突。It should be noted that the remote direct memory access self-increment operation is an RDMA FAA operation, and the modification slave node can perform an RDMA FAA (Fetch-and-Add) operation through the remote direct memory access network to perform index self-increment in the index area of the log space of the modification master node, and obtain the index increment obtained by the index self-increment. The index increment can be a numerical value. The index increment corresponds to a unique modification slave node, and the modification slave node can correspond to multiple index increments. When the index area is updated, the index increment will also correspond to a new log area and a new result area, which are used to store the metadata to be modified of the modification slave node corresponding to the index increment and the modification status of the metadata to be modified. The modification result can be the data after the metadata to be modified is modified, or it can be the modification status of the metadata to be modified, etc. Exemplarily, the modification slave node can write the expected metadata to be modified in the log area associated with the index increment, the modification slave node can write multiple metadata to be modified in the log area, the modification master node will poll the log area, detect whether the modification slave node has newly written metadata to be modified, the modification master node will make corresponding modifications to the metadata to be modified in the log area, and the modification master node will also update the modification status in the result area corresponding to the metadata to be modified. The modification slave node will poll the result area to detect whether the modification of one or more metadata to be modified is completed. When the metadata to be modified is completed, the modification result of the metadata to be modified can be fed back to the client of the slave node to be modified. In this embodiment, the modification slave node can modify the corresponding metadata to be modified by modifying the master node, avoiding concurrency conflicts between servers.

基于上述实施例可知，服务端可以自主协调并发写冲突方法(确定修改主节点和修改从节点)，能够在内存池不参与的情况下高效完成写冲突的协调解决，在保证不发生冲突，且不影响写操作的效率的情况下有效避免内存节点的计算资源浪费。传统的单体服务器架构中，计算和存储均在服务端进行，服务端可以接收来自各端的修改请求，然后统一协调修改并持久化。但分离式持久性内存架构是存算分离的架构，负责持久化存储的内存池无法承担起写冲突解决这一复杂的计算过程。一般的方式是让多个服务端在内存池竞争同一把锁，持有锁的服务端则将自己的修改写入内存池，随后释放锁。但在高并发的情况下，激烈的锁竞争会导致延迟大幅提高。本实施例采用了服务端自主协调的方式，让并发修改的众多服务端自主确信修改主节点和修改从节点，修改主节点负责收集修改，修改从节点则向修改主节点提交期望修改的待修改元数据，且不会阻塞，由此能够在计算节点(服务端)层面就完成冲突解决，并且保障了高并发下仍能够保持较低的延迟。Based on the above embodiments, it can be known that the server can autonomously coordinate concurrent write conflict methods (determine the modification of the master node and the modification of the slave node), and can efficiently complete the coordination and resolution of write conflicts without the participation of the memory pool, while ensuring that no conflicts occur and the efficiency of the write operation is not affected. Effectively avoid the waste of computing resources of the memory node. In the traditional single server architecture, both computing and storage are performed on the server. The server can receive modification requests from each end, and then coordinate the modifications and persist them uniformly. However, the separated persistent memory architecture is an architecture that separates storage and computing. The memory pool responsible for persistent storage cannot undertake the complex computing process of writing conflict resolution. The general approach is to let multiple servers compete for the same lock in the memory pool, and the server holding the lock writes its own modifications to the memory pool and then releases the lock. However, in the case of high concurrency, fierce lock competition will cause a significant increase in latency. This embodiment adopts a server-side autonomous coordination method, allowing many concurrently modified servers to autonomously confirm the modification of the master node and the modification of the slave node. The modification of the master node is responsible for collecting modifications, and the modification of the slave node submits the desired modified metadata to the modification of the master node without blocking. In this way, conflict resolution can be completed at the computing node (server) level, and low latency can be maintained under high concurrency.

为更好理解本实施例，参照图5，图5为修改主节点的日志空间的示意图，对本实施例中的修改主节点和修改从节点的写操作进行说明。首先对图5中显示的内容进行说明：图5中的内存池有MDB，有修改主节点的服务端还有修改从节点的服务端，在修改主节点中包括日志区、结果区和索引区域中的索引增值，索引区域、日志区和结果区构成了修改主节点的日志空间。其次，参照图5对写操作的流程进行说明包括步骤S1～步骤S5：步骤S1：修改主节点在MDB的锁区域进行上锁；步骤S2：修改从节点在索引区域进行索引自增，得到修改从节点的索引值2；步骤S3：修改从节点在日志区写入期望修改的待修改元数据；步骤S4：修改主节点在结果区更新期望的待修改元数据的修改状态，修改从节点在检测到待修改元数据的修改状态更新，可以向修改从节点的客户端反馈对应的修改结果；步骤S5：在修改主节点修改完修改从节点期望修改的待修改元数据后，检测在预设时间内索引区域没有更新，且日志区也没有更新，则可以对MDB解锁，完成对MDB的并发修改。从而实现并发写带来的冲突能够在计算节点层面就完成解决，并且保障了高并发下仍能够保持较低的延迟。To better understand this embodiment, refer to FIG5, which is a schematic diagram of modifying the log space of the master node, and explain the write operations of modifying the master node and modifying the slave node in this embodiment. First, the content shown in FIG5 is explained: the memory pool in FIG5 has an MDB, a server for modifying the master node and a server for modifying the slave node, and the index increment in the modification of the master node includes the log area, the result area, and the index area. The index area, the log area, and the result area constitute the log space of the modification of the master node. Secondly, the process of the write operation is described with reference to FIG5, including steps S1 to S5: Step S1: the modification master node locks the lock area of the MDB; Step S2: the modification slave node increments the index in the index area to obtain the index value 2 of the modified slave node; Step S3: the modification slave node writes the metadata to be modified that is expected to be modified in the log area; Step S4: the modification master node updates the modification status of the expected metadata to be modified in the result area, and the modification slave node can feedback the corresponding modification result to the client of the modified slave node after detecting that the modification status of the metadata to be modified is updated; Step S5: after the modification master node modifies the metadata to be modified that the modified slave node expects to be modified, if it is detected that the index area is not updated within the preset time, and the log area is not updated, the MDB can be unlocked to complete the concurrent modification of the MDB. Thus, the conflict caused by concurrent writing can be resolved at the computing node level, and low latency can be maintained under high concurrency.

本申请还提供一种元数据管理装置，请参照图6，应用于配置在元数据管理系统上的服务端，所述服务端与所述元数据管理系统上的内存池通过远程直接内存访问网络建立通信连接，所述的装置包括：响应模块10，用于响应于客户端发送的访问请求，从所述访问请求中获取待访问节点的路径信息；调用模块20，用于调用远程直接内存访问网络，根据所述路径信息在所述元数据管理系统的内存池中查找所述待访问节点的内存地址；操作模块30，用于通过所述内存地址，获取所述待访问节点的访问元数据并对所述访问元数据进行数据操作。本申请提供的元数据管理装置，采用上述实施例中的元数据管理方法，旨在解决因传统文件系统过深的输入/输出软件栈导致的访问延迟过高的技术问题。与现有技术相比，本申请实施例提供的元数据管理方法的有益效果与上述实施例提供的元数据管理方法的有益效果相同，且该元数据管理装置中的其他技术特征与上述实施例方法公开的特征相同，在此不做赘述。The present application also provides a metadata management device, please refer to Figure 6, which is applied to a server configured on a metadata management system, and the server establishes a communication connection with the memory pool on the metadata management system through a remote direct memory access network. The device includes: a response module 10, which is used to respond to an access request sent by a client and obtain path information of the node to be accessed from the access request; a calling module 20, which is used to call a remote direct memory access network and search the memory address of the node to be accessed in the memory pool of the metadata management system according to the path information; an operation module 30, which is used to obtain the access metadata of the node to be accessed through the memory address and perform data operations on the access metadata. The metadata management device provided by the present application adopts the metadata management method in the above embodiment, aiming to solve the technical problem of high access delay caused by the deep input/output software stack of the traditional file system. Compared with the prior art, the beneficial effects of the metadata management method provided in the embodiment of the present application are the same as the beneficial effects of the metadata management method provided in the above embodiment, and other technical features in the metadata management device are the same as the features disclosed in the above embodiment method, which will not be repeated here.

本申请实施例提供一种电子设备，所述电子设备可以为播放设备，电子设备包括：至少一个处理器；以及，与至少一个处理器通信连接的存储器；其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行上述实施例中的元数据管理方法。下面参考图7，其示出了适于用来实现本公开实施例的电子设备的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图7示出的电子设备仅仅是一个示例，不应对本公开实施例的功能和使用范围带来任何限制。如图7所示，电子设备可以包括处理装置1001(例如中央处理器、图形处理器等)，其可以根据存储在只读存储器(ROM)1002中的程序或者从存储装置1003加载到随机访问存储器(RAM)1004中的程序而执行各种适当的动作和处理。在RAM1004中，还存储有电子设备操作所需的各种程序和数据。处理装置1001、ROM1002以及RAM1004通过总线1005彼此相连。输入/输出(I/O)接口1006也连接至总线。通常，以下系统可以连接至I/O接口1006：包括例如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等的输入装置1007；包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置1008；包括例如磁带、硬盘等的存储装置1003；以及通信装置1009。通信装置可以允许电子设备与其他设备进行无线或有线通信以交换数据。虽然图中示出了具有各种系统的电子设备，但是应理解的是，并不要求实施或具备所有示出的系统。可以替代地实施或具备更多或更少的系统。The embodiment of the present application provides an electronic device, which can be a playback device, and the electronic device includes: at least one processor; and a memory connected to the at least one processor in communication; wherein the memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor so that at least one processor can execute the metadata management method in the above embodiment. Referring to FIG. 7 below, it shows a schematic diagram of the structure of an electronic device suitable for implementing the embodiment of the present disclosure. The electronic device in the embodiment of the present disclosure may include but is not limited to mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle terminals (such as vehicle navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc. The electronic device shown in FIG. 7 is only an example and should not bring any restrictions to the functions and scope of use of the embodiment of the present disclosure. As shown in FIG. 7, the electronic device may include a processing device 1001 (such as a central processing unit, a graphics processor, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1002 or a program loaded from a storage device 1003 to a random access memory (RAM) 1004. In RAM 1004, various programs and data required for the operation of the electronic device are also stored. The processing device 1001, ROM 1002 and RAM 1004 are connected to each other via a bus 1005. An input/output (I/O) interface 1006 is also connected to the bus. Typically, the following systems can be connected to the I/O interface 1006: an input device 1007 including, for example, a touch screen, a touchpad, a keyboard, a mouse, an image sensor, a microphone, an accelerometer, a gyroscope, etc.; an output device 1008 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; a storage device 1003 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1009. The communication device can allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. Although the figure shows an electronic device with various systems, it should be understood that it is not required to implement or have all the systems shown. More or fewer systems may be implemented or have alternatively.

特别地，根据本公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信装置1009从网络上被下载和安装，或者从存储装置1003被安装，或者从ROM1002被安装。在该计算机程序被处理装置1001执行时，执行本公开实施例的方法中限定的上述功能。本申请提供的电子设备，采用上述实施例一中的元数据管理方法旨在解决因传统文件系统过深的输入/输出软件栈导致的访问延迟过高的技术问题。与现有技术相比，本申请实施例提供的产品流量数据分配的有益效果与上述实施例提供的元数据管理方法的有益效果相同，且该元数据管理装置中的其他技术特征与上述实施例方法公开的特征相同，在此不做赘述。应当理解，本公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式的描述中，具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。本实施例提供一种计算机可读存储介质，具有存储在其上的计算机可读程序指令，计算机可读程序指令用于执行上述实施例一中的元数据管理方法。In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program includes a program code for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from the network through the communication device 1009, or installed from the storage device 1003, or installed from the ROM 1002. When the computer program is executed by the processing device 1001, the above functions defined in the method of the embodiment of the present disclosure are executed. The electronic device provided by the present application adopts the metadata management method in the above embodiment 1 to solve the technical problem of too high access delay caused by the input/output software stack of the traditional file system being too deep. Compared with the prior art, the beneficial effect of the product flow data allocation provided by the embodiment of the present application is the same as the beneficial effect of the metadata management method provided by the above embodiment, and the other technical features in the metadata management device are the same as the features disclosed in the above embodiment method, which will not be repeated here. It should be understood that the various parts of the present disclosure can be implemented with hardware, software, firmware or a combination thereof. In the description of the above implementation modes, specific features, structures, materials or characteristics may be combined in a suitable manner in any one or more embodiments or examples. The above is only a specific implementation mode of the present application, but the scope of protection of the present application is not limited thereto. Any technician familiar with the technical field can easily think of changes or replacements within the technical scope disclosed in the present application, which should be included in the scope of protection of the present application. Therefore, the scope of protection of the present application shall be based on the scope of protection of the claims. The present embodiment provides a computer-readable storage medium having computer-readable program instructions stored thereon, and the computer-readable program instructions are used to execute the metadata management method in the above-mentioned embodiment one.

本申请实施例提供的计算机可读存储介质例如可以是U盘，但不限于电、磁、光、电磁、红外线、或半导体的设备、设备或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程EPROM(Electrical ProgrammableRead Only Memory，只读存储器)或闪存、光纤、便携式紧凑磁盘CD-ROM(compact discread-only memory，只读存储器)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本实施例中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行设备、设备或者器件使用或者与其结合使用。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(RadioFrequency，射频)等等，或者上述的任意合适的组合。上述计算机可读存储介质可以是电子设备中所包含的；也可以是单独存在，而未装配入电子设备中。The computer-readable storage medium provided in the embodiment of the present application may be, for example, a USB flash drive, but is not limited to electrical, magnetic, optical, electromagnetic, infrared, or semiconductor devices, equipment or devices, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable EPROM (Electrical Programmable Read Only Memory, read-only memory) or flash memory, an optical fiber, a portable compact disk CD-ROM (compact disc read-only memory, read-only memory), an optical storage device, a magnetic storage device, or any suitable combination of the above. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program, which may be used by or in combination with an instruction execution device, device or device. The program code contained on the computer-readable storage medium may be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (Radio Frequency, radio frequency), etc., or any suitable combination of the above. The above-mentioned computer-readable storage medium may be contained in an electronic device; it may also exist alone without being assembled into an electronic device.

上述计算机可读存储介质承载有一个或者多个程序，当上述一个或者多个程序被电子设备执行时，使得电子设备：响应于客户端发送的访问请求，从所述访问请求中获取待访问节点的路径信息；调用远程直接内存访问网络，根据所述路径信息在所述元数据管理系统的内存池中查找所述待访问节点的内存地址；通过所述内存地址，获取所述待访问节点的访问元数据并对所述访问元数据进行数据操作。可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码，上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括LAN(local area network，局域网)或WAN(Wide AreaNetwork，广域网)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。附图中的流程图和框图，图示了按照本申请各种实施例的设备、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的设备来实现，或者可以用专用硬件与计算机指令的组合来实现。描述于本公开实施例中所涉及到的模块可以通过软件的方式实现，也可以通过硬件的方式来实现。其中，模块的名称在某种情况下并不构成对该单元本身的限定。本申请提供的计算机可读存储介质，存储有用于执行上述元数据管理方法的计算机可读程序指令，旨在解决因传统文件系统过深的输入/输出软件栈导致的访问延迟过高的技术问题。与现有技术相比，本申请实施例提供的计算机可读存储介质的有益效果与上述实施例提供的元数据管理方法的有益效果相同，在此不做赘述。本申请还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述的元数据管理方法的步骤。本申请提供的计算机程序产品旨在解决因传统文件系统过深的输入/输出软件栈导致的访问延迟过高的技术问题。与现有技术相比，本申请实施例提供的计算机程序产品的有益效果与上述实施例提供的元数据管理方法的有益效果相同，在此不做赘述。以上仅为本申请的优选实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利处理范围内。The computer-readable storage medium carries one or more programs. When the one or more programs are executed by the electronic device, the electronic device: in response to the access request sent by the client, obtains the path information of the node to be accessed from the access request; calls the remote direct memory access network, and searches the memory address of the node to be accessed in the memory pool of the metadata management system according to the path information; obtains the access metadata of the node to be accessed through the memory address and performs data operations on the access metadata. The computer program code for performing the operations of the present disclosure can be written in one or more programming languages or a combination thereof, and the programming languages include object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" language or similar programming languages. The program code can be executed completely on the user's computer, partially on the user's computer, as an independent software package, partially on the user's computer and partially on the remote computer, or completely on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user computer through any type of network, including a LAN (local area network) or a WAN (Wide Area Network), or can be connected to an external computer (e.g., using an Internet service provider to connect through the Internet). The flowcharts and block diagrams in the accompanying drawings illustrate the possible architecture, functions and operations of the devices, methods and computer program products according to various embodiments of the present application. In this regard, each box in the flowchart or block diagram can represent a module, a program segment, or a part of a code, which contains one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram and/or flowchart, and the combination of boxes in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based device that performs a specified function or operation, or can be implemented by a combination of dedicated hardware and computer instructions. The modules involved in the embodiments described in the present disclosure can be implemented by software or by hardware. Among them, the name of the module does not constitute a limitation on the unit itself under certain circumstances. The computer-readable storage medium provided by the present application stores computer-readable program instructions for executing the above-mentioned metadata management method, which is intended to solve the technical problem of excessive access delay caused by the input/output software stack of the traditional file system that is too deep. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in the embodiment of the present application are the same as the beneficial effects of the metadata management method provided in the above-mentioned embodiment, and will not be repeated here. The present application also provides a computer program product, including a computer program, which implements the steps of the metadata management method as described above when the computer program is executed by a processor. The computer program product provided by the present application is intended to solve the technical problem of excessive access delay caused by the input/output software stack of the traditional file system that is too deep. Compared with the prior art, the beneficial effects of the computer program product provided by the embodiment of the present application are the same as the beneficial effects of the metadata management method provided by the above embodiment, which will not be described in detail here. The above are only preferred embodiments of the present application, and do not limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the specification and drawings of the present application, or directly or indirectly applied in other related technical fields, are also included in the patent processing scope of the present application.

Claims

1. A metadata management method, characterized in that it is applied to a server configured on a metadata management system, the server and a memory pool on the metadata management system establish a communication connection through a remote direct memory access network, and the method comprises:

In response to an access request sent by a client, obtaining path information of a node to be accessed from the access request;

Calling a remote direct memory access network to search for a memory address of the node to be accessed in a memory pool of the metadata management system according to the path information;

The access metadata of the node to be accessed is obtained through the memory address and data operations are performed on the access metadata.

2. The method according to claim 1, wherein the step of calling the remote direct memory access network and searching the memory address of the node to be accessed in the memory pool of the metadata management system according to the path information comprises:

Determine a root directory node from the path information, call the remote direct memory access network, and read a root metadata bucket where the root directory node is located in the memory pool;

Taking the root directory node as the target node, searching the root metadata bucket for a target hash bucket that matches the target hash value of the target node, and searching the target metadata of the target node in the target hash bucket;

In a case where the access rights of the target metadata are open to the client, obtaining a secondary path node of the target node in the path information, and calculating a secondary hash value of the secondary path node according to the target hash value and a secondary name of the secondary path node;

Searching the root metadata bucket for a secondary hash bucket that matches the secondary hash value, and if the secondary hash bucket contains node metadata of the secondary path node, determining whether the secondary path node is a node to be accessed;

If so, the memory address is determined according to the secondary hash value.

3. The method of claim 2, wherein after the step of searching the root metadata bucket for a secondary hash bucket that matches the secondary hash value, the method further comprises:

If the node metadata of the secondary path node does not exist in the secondary hash bucket, determining whether the parent node of the secondary path node has an extended metadata bucket;

In the case where the parent node has an extended metadata bucket, reading the extended metadata bucket in the memory pool through a remote direct memory access network, and searching for an extended hash bucket of the secondary path node in the extended metadata bucket;

If the node metadata of the secondary path node exists in the extended hash bucket, then when the secondary path node is a node to be accessed, the memory address is determined according to the secondary hash value.

4. The method according to claim 1, wherein the data operation comprises a write operation; and the step of performing data operation on the access metadata comprises:

In the case where the type of the access request is a write request, calling a remote direct memory access replacement operation to determine whether a lock area of an access metadata bucket where the access metadata is located is empty;

If it is empty, the server is determined to be a modification master node, and writes the master configuration information of the modification master node into the lock area of the access metadata bucket through a remote direct memory access replacement operation, obtains the write open permission in the access metadata bucket, and performs a write operation on the access metadata in the access metadata bucket;

If it is not empty, the server is determined to be a modification slave node, and the master configuration information is obtained from the lock area of the access metadata bucket, a communication connection of a remote direct memory access network is established with the modification master node according to the master configuration information, and the access metadata is sent to the modification master node to perform a write operation on the access metadata through the modification master node.

5. The method according to claim 4, characterized in that the main configuration information includes node configuration information and remote direct memory access configuration information; the access metadata includes metadata to be modified;

The steps of writing the master configuration information of the modified master node in the lock area of the access metadata bucket through remote direct memory access replacement operation, obtaining write open permission in the access metadata bucket, and performing a write operation on the access metadata in the access metadata bucket include:

By means of the remote direct memory access replacement operation, the node configuration information of the modified master node is written into the lock area of the access metadata bucket, a log space is created in the modified master node, and the remote direct memory access configuration information of the log space is determined;

Registering the remote direct memory access configuration information of the log space to the modification master node, and writing the remote direct memory access configuration information into the lock area, and obtaining the write-open permission for accessing the metadata bucket;

When the write open permission in the access metadata bucket is obtained, a write operation is performed on the access metadata.

6. The method according to claim 5, characterized in that after the step of obtaining the write open permission in the access metadata bucket, it also includes:

Monitoring whether the index area of the log space is updated, if the index area is updated, obtaining an index increment from the index area, obtaining metadata to be modified and modification information of the metadata to be modified from the log area associated with the index increment;

The metadata to be modified is modified in the access metadata bucket according to the modification information, and the modification status of the metadata to be modified is updated in the result area associated with the index increment.

7. The method according to claim 4, wherein the access metadata includes metadata to be modified, and the step of sending the access metadata to the modification master node to perform a write operation on the access metadata through the modification master node comprises:

Calling a remote direct memory access self-increment operation to perform index self-increment in the index area of the log space of the modified master node to obtain the index increment of the modified slave node;

According to the index increment, the metadata to be modified is written into the log area associated with the index increment, so that the modification master node can modify the metadata to be modified in the log area;

The result area of the indexed value-added association is polled, and when it is detected that the modification status of the metadata to be modified in the result area is updated, the modification result of the metadata to be modified is fed back to the client of the modification slave node.

8. A metadata management device, characterized in that it is applied to a server configured on a metadata management system, the server establishes a communication connection with a memory pool on the metadata management system via a remote direct memory access network, and the device comprises:

A response module, used to respond to an access request sent by a client and obtain path information of a node to be accessed from the access request;

A calling module, used for calling a remote direct memory access network, and searching for a memory address of the node to be accessed in a memory pool of the metadata management system according to the path information;

The operation module is used to obtain the access metadata of the node to be accessed through the memory address and perform data operations on the access metadata.

9. An electronic device, characterized in that the electronic device comprises:

at least one processor;

and a memory communicatively coupled to the at least one processor;

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can perform the steps of the metadata management method described in any one of claims 1 to 7.

10. A storage medium, characterized in that the storage medium is a computer-readable storage medium, and a program for implementing a metadata management method is stored on the computer-readable storage medium, and the program for implementing the metadata management method is executed by a processor to implement the steps of the metadata management method as described in any one of claims 1 to 7.