[go: up one dir, main page]

CN116048425B - A layered cache method, system and related components - Google Patents

A layered cache method, system and related components Download PDF

Info

Publication number
CN116048425B
CN116048425B CN202310220769.3A CN202310220769A CN116048425B CN 116048425 B CN116048425 B CN 116048425B CN 202310220769 A CN202310220769 A CN 202310220769A CN 116048425 B CN116048425 B CN 116048425B
Authority
CN
China
Prior art keywords
file
data
operation request
layered
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310220769.3A
Other languages
Chinese (zh)
Other versions
CN116048425A (en
Inventor
臧林劼
何怡川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IEIT Systems Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN202310220769.3A priority Critical patent/CN116048425B/en
Publication of CN116048425A publication Critical patent/CN116048425A/en
Application granted granted Critical
Publication of CN116048425B publication Critical patent/CN116048425B/en
Priority to PCT/CN2024/080583 priority patent/WO2024183799A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种分层缓存方法、系统及相关组件,涉及分布式存储领域,该分层缓存方法应用于分布式存储系统的每一计算节点,包括:利用客户端进程监测客户端向分布式存储系统发出的文件IO操作请求,当监测到文件IO操作请求,将文件IO操作请求重定向到服务端进程;利用服务端进程判断文件IO操作请求对应的目标存储位置是否为聚合缓存层;若否,从分布式存储系统底层读取文件IO操作请求对应的数据,并将数据缓存到聚合缓存层;若是,从聚合缓存层中读取数据,并将数据返回到客户端进程,以便客户端进程将数据返回至客户端。本申请能够提高海量小文件数据集的IO性能,改善高并发元数据密集型文件系统业务中元数据造成的性能瓶颈。

Figure 202310220769

This application discloses a layered caching method, system and related components, which relate to the field of distributed storage. The layered caching method is applied to each computing node in the distributed storage system, including: using the client process to monitor the distribution of client directions The file IO operation request issued by the storage system, when the file IO operation request is detected, the file IO operation request is redirected to the server process; the server process is used to determine whether the target storage location corresponding to the file IO operation request is the aggregation cache layer; If not, read the data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and cache the data to the aggregated cache layer; if so, read the data from the aggregated cache layer, and return the data to the client process, so that the client The end process returns data to the client. The application can improve the IO performance of massive small file data sets, and improve the performance bottleneck caused by metadata in high-concurrency metadata-intensive file system services.

Figure 202310220769

Description

一种分层缓存方法、系统及相关组件A layered cache method, system and related components

技术领域technical field

本申请涉及分布式存储领域,特别涉及一种分层缓存方法、系统及相关组件。The present application relates to the field of distributed storage, in particular to a layered cache method, system and related components.

背景技术Background technique

随着HPC(High Performance Computing,高性能计算机群)计算能力的快速增长,大规模、高并发应用程序对分布式存储系统IO(Input Output,输入输出)带来了较大压力。高性能HPC场景所涉及的三个关键要素:要素一,每个高性能计算节点的数据和标签元数据属性均需要进行大规模、高并发存储IO,其中80%的IO包括加载、修饰数据集,用于HPC训练和随机数据检索,典型的密集型IO模型为

Figure SMS_1
;要素二,HPC高性能计算过程,包括数据预处理,数据标记,数据压缩等;要素三,分布式数据一致性同步更新。With the rapid growth of HPC (High Performance Computing, high-performance computer group) computing power, large-scale, high-concurrency applications have brought greater pressure on the distributed storage system IO (Input Output, input and output). Three key elements involved in high-performance HPC scenarios: Element 1, the data and label metadata attributes of each high-performance computing node need to perform large-scale, high-concurrency storage IO, of which 80% of the IO includes loading and modifying data sets , for HPC training and random data retrieval, a typical intensive IO model is
Figure SMS_1
; Element two, HPC high-performance computing process, including data preprocessing, data labeling, data compression, etc.; element three, distributed data consistency and synchronous update.

通过分析HPC场景的三个关键要素发现,高性能计算产生的海量小文件高并发IO和随机访问,容易造成分布式存储系统IO读写性能饱和,例如常见的HPC数据集,一般包含3000个不同种类的超过200万以上的小文件,如果存储IO读写软件栈无法满足大规模运行的HPC需求,将会阻塞高性能计算业务,因此,分布式存储系统IO性能对高性能计算业务至关重要。现有的技术方案提出了一些针对高性能计算提高存储IO性能的优化方案,如预取和缓存,然而,在HPC高性能计算场景上采用现有解决方案进行大规模、高并发存储IO依然存在许多技术挑战,如针对小文件读取密集型的高性能IO,会产生巨大的分布式存储系统元数据服务开销,进而影响数据存储效率。By analyzing the three key elements of the HPC scenario, it is found that the large number of small files generated by high-performance computing and high concurrent IO and random access will easily cause the IO read and write performance of the distributed storage system to be saturated. For example, common HPC datasets generally contain 3000 different files. For small files of more than 2 million types, if the storage IO read-write software stack cannot meet the HPC requirements for large-scale operations, it will block high-performance computing services. Therefore, the IO performance of distributed storage systems is crucial to high-performance computing services . Existing technical solutions have proposed some optimization solutions for high-performance computing to improve storage IO performance, such as prefetching and caching. However, there are still large-scale and high-concurrency storage IO using existing solutions in HPC high-performance computing scenarios. Many technical challenges, such as reading-intensive high-performance IO for small files, will generate huge metadata service overhead for distributed storage systems, which in turn will affect data storage efficiency.

因此,如何提供一种解决上述技术问题的方案是本领域技术人员目前需要解决的问题。Therefore, how to provide a solution to the above technical problems is a problem that those skilled in the art need to solve at present.

发明内容Contents of the invention

本申请的目的是提供一种分层缓存方法、系统及相关组件,能够提高海量小文件数据集的IO性能,改善高并发元数据密集型文件系统业务中元数据造成的性能瓶颈。The purpose of this application is to provide a layered cache method, system and related components, which can improve the IO performance of massive small file data sets, and improve the performance bottleneck caused by metadata in high-concurrency metadata-intensive file system services.

为解决上述技术问题,本申请提供了分层缓存方法,应用于分布式存储系统的每一计算节点,所述分层缓存方法包括:In order to solve the above technical problems, the present application provides a hierarchical caching method, which is applied to each computing node of the distributed storage system, and the hierarchical caching method includes:

利用客户端进程监测客户端向所述分布式存储系统发出的文件IO操作请求,当监测到所述文件IO操作请求,将所述文件IO操作请求重定向到服务端进程;Using the client process to monitor the file IO operation request sent by the client to the distributed storage system, when the file IO operation request is detected, redirecting the file IO operation request to the server process;

利用所述服务端进程判断所述文件IO操作请求对应的目标存储位置是否为聚合缓存层;Using the server process to determine whether the target storage location corresponding to the file IO operation request is an aggregation cache layer;

若否,从所述分布式存储系统底层读取所述文件IO操作请求对应的数据,并将所述数据缓存到所述聚合缓存层;If not, read the data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and cache the data to the aggregation cache layer;

若是,从所述聚合缓存层中读取所述数据,并将所述数据返回到所述客户端进程,以便所述客户端进程将所述数据返回至所述客户端。If so, read the data from the aggregation cache layer, and return the data to the client process, so that the client process returns the data to the client.

可选的,利用所述服务端进程判断所述文件IO操作请求对应的目标存储位置是否为聚合缓存层的过程包括:Optionally, the process of using the server process to determine whether the target storage location corresponding to the file IO operation request is an aggregation cache layer includes:

利用所述服务端进程将接收到的所述文件IO操作请求插入到共享队列中;Utilize the server process to insert the received file IO operation request into a shared queue;

在所述共享队列中,确定所述文件IO操作请求对应的数据是否为已缓存数据;In the shared queue, determine whether the data corresponding to the file IO operation request is cached data;

若是,则判定所述文件IO操作请求对应的目标存储位置为聚合缓存层。If yes, it is determined that the target storage location corresponding to the file IO operation request is the aggregation cache layer.

可选的,在所述共享队列中,确定所述文件IO操作请求对应的数据是否为已缓存数据的过程包括:Optionally, in the shared queue, the process of determining whether the data corresponding to the file IO operation request is cached data includes:

在所述共享队列中,通过数据线程确定所述文件IO操作请求对应的数据是否为已缓存数据;所述数据线程为构建服务端进程实例时生成的线程。In the shared queue, a data thread is used to determine whether the data corresponding to the file IO operation request is cached data; the data thread is a thread generated when the server process instance is constructed.

可选的,所述分层缓存方法还包括:Optionally, the layered caching method further includes:

当监测到所述客户端向所述分布式存储系统发出的文件IO操作请求,启动所述服务端进程,利用本计算节点的状态和与本计算节点相邻的其他计算节点的状态动态构建所述服务端进程实例。When the file IO operation request sent by the client to the distributed storage system is detected, the server process is started, and the state of the computing node and other computing nodes adjacent to the computing node are used to dynamically construct the Instance of the server process described above.

可选的,从所述聚合缓存层中读取所述数据的过程包括:Optionally, the process of reading the data from the aggregation cache layer includes:

通过数据线程将所述文件IO操作请求重定向至所述聚合缓存层,以便从所述聚合缓存层读取所述数据。The file IO operation request is redirected to the aggregation cache layer through a data thread, so as to read the data from the aggregation cache layer.

可选的,利用所述服务端进程将接收到的所述文件IO操作请求插入到共享队列中的同时,该分层缓存方法还包括:Optionally, when using the server process to insert the received file IO operation request into the shared queue, the layered caching method further includes:

将所述共享队列配置互斥锁。Configure the shared queue with a mutex.

可选的,所述共享队列为FIFO队列。Optionally, the shared queue is a FIFO queue.

可选的,所述文件IO操作请求对应的数据包括文件描述符、读取偏移量和长度。Optionally, the data corresponding to the file IO operation request includes a file descriptor, read offset and length.

可选的,所述分层缓存方法还包括:Optionally, the layered caching method further includes:

基于文件路径和所属计算节点确定所述数据在所述聚合缓存层中的存储位置。The storage location of the data in the aggregation cache layer is determined based on the file path and the computing node to which it belongs.

可选的,所述分层缓存方法还包括:Optionally, the layered caching method further includes:

将所述文件IO操作请求广播给与本计算节点相邻的计算节点。Broadcast the file IO operation request to computing nodes adjacent to the current computing node.

可选的,所述分层缓存方法还包括:Optionally, the layered caching method further includes:

判断所述文件IO操作请求对应的数据集是否大于本地存储介质的总容量;Judging whether the data set corresponding to the file IO operation request is greater than the total capacity of the local storage medium;

若是,执行缓存逐出操作和替换操作。If so, perform cache eviction and replacement operations.

可选的,所述分层缓存方法还包括:Optionally, the layered caching method further includes:

基于环境变量构建动态链接库;所述动态链接库用于拦截所述文件IO操作请求。A dynamic link library is built based on environment variables; the dynamic link library is used to intercept the file IO operation request.

可选的,将所述文件IO操作请求重定向到服务端进程的过程包括:Optionally, the process of redirecting the file IO operation request to the server process includes:

通过哈希算法将所述文件IO操作请求重定向到服务端进程。The file IO operation request is redirected to the server process through a hash algorithm.

可选的,所述聚合缓存层为由所述分布式存储系统中各个所述计算节点中的高速存储介质构成的缓存层。Optionally, the aggregation cache layer is a cache layer composed of high-speed storage media in each of the computing nodes in the distributed storage system.

可选的,所述高速存储介质为Nvme SSD。Optionally, the high-speed storage medium is Nvme SSD.

可选的,所述分层缓存方法还包括:Optionally, the layered caching method further includes:

当满足清除条件,清除本计算节点上中高速存储介质中存储的数据。When the clearing condition is met, clear the data stored in the high-speed storage medium on the computing node.

为解决上述技术问题,本申请还提供了一种分层缓存系统,应用于分布式存储系统的每一计算节点,所述分层缓存系统包括:In order to solve the above technical problems, the present application also provides a layered cache system, which is applied to each computing node of the distributed storage system, and the layered cache system includes:

监测模块,用于利用客户端进程监测客户端向所述分布式存储系统发出的文件IO操作请求,当监测到所述文件IO操作请求,将所述文件IO操作请求重定向到服务端进程;A monitoring module, configured to use a client process to monitor the file IO operation request sent by the client to the distributed storage system, and redirect the file IO operation request to the server process when the file IO operation request is detected;

处理模块,用于利用所述服务端进程判断所述文件IO操作请求对应的目标存储位置是否为聚合缓存层,若否,触发第一读取模块,若是,触发第二读取模块;A processing module, configured to use the server process to judge whether the target storage location corresponding to the file IO operation request is an aggregation cache layer, if not, trigger the first reading module, and if so, trigger the second reading module;

第一读取模块,用于从所述分布式存储系统底层读取所述文件IO操作请求对应的数据,并将所述数据缓存到所述聚合缓存层;The first reading module is configured to read the data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and cache the data to the aggregation cache layer;

第二读取模块,用于从所述聚合缓存层中读取所述数据,并将所述数据返回到所述客户端进程,以便所述客户端进程将所述数据返回至所述客户端。A second reading module, configured to read the data from the aggregation cache layer, and return the data to the client process, so that the client process returns the data to the client .

为解决上述技术问题,本申请还提供了一种电子设备,包括:In order to solve the above technical problems, the present application also provides an electronic device, including:

存储器,用于存储计算机程序;memory for storing computer programs;

处理器,用于执行所述计算机程序时实现如上文任意一项所述的分层缓存方法的步骤。A processor, configured to implement the steps of the hierarchical cache method described in any one of the above when executing the computer program.

为解决上述技术问题,本申请还提供了一种分布式存储系统,包括存储底层模块和多个节点,每个所述节点均包括分层客户端进程、分层服务端进程及存储介质,各个所述节点的存储介质构成聚合缓存层,其中:In order to solve the above technical problems, the present application also provides a distributed storage system, which includes a storage base module and a plurality of nodes, each of which includes a layered client process, a layered server process, and a storage medium, each The storage medium of the node constitutes an aggregation cache layer, wherein:

所述分层客户端进程,用于监测客户端发出的文件IO操作请求,当监测到所述文件IO操作请求,将所述文件IO操作请求重定向到所述分层服务端进程;The layered client process is used to monitor the file IO operation request sent by the client, and when the file IO operation request is detected, redirect the file IO operation request to the layered server process;

所述分层服务端进程,用于判断所述文件IO操作请求对应的目标存储位置是否为所述聚合缓存层,若否,从所述存储底层模块读取所述文件IO操作请求对应的数据,并将所述数据发送到所述聚合缓存层,若是,从所述聚合缓存层中读取所述数据,并将所述数据返回到所述分层客户端进程,以便所述分层客户端进程将所述数据返回至所述客户端;The layered server process is used to judge whether the target storage location corresponding to the file IO operation request is the aggregation cache layer, and if not, read the data corresponding to the file IO operation request from the storage bottom layer module , and send the data to the aggregation cache layer, if so, read the data from the aggregation cache layer, and return the data to the layered client process so that the layered client The end process returns the data to the client;

所述聚合缓存层,用于存储所述分层服务端进程发送的数据。The aggregation cache layer is used to store the data sent by the layered server process.

为解决上述技术问题,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上文任意一项所述的分层缓存方法的步骤。In order to solve the above-mentioned technical problems, the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it realizes any of the above-mentioned Steps of the tiered caching approach.

本申请提供了一种分层缓存方法,客户端进程监测到文件IO操作请求后,将文件IO操作请求重定向到服务端进程,服务端进程先在聚合缓存层中检索文件,在聚合缓存层未命中时,再从分布式存储系统底层检索数据,以提高海量小文件数据集的IO性能,改善高并发元数据密集型文件系统业务中元数据造成的性能瓶颈。本申请还提供了一种分层缓存系统、电子设备、分布式存储系统及计算机可读存储介质,具有和上述分层缓存方法相同的有益效果。This application provides a layered caching method. After the client process monitors the file IO operation request, it redirects the file IO operation request to the server process. When there is a miss, the data is retrieved from the bottom layer of the distributed storage system to improve the IO performance of massive small file datasets and improve the performance bottleneck caused by metadata in the highly concurrent metadata-intensive file system business. The present application also provides a layered caching system, electronic equipment, a distributed storage system and a computer-readable storage medium, which have the same beneficial effect as the above-mentioned layered caching method.

附图说明Description of drawings

为了更清楚地说明本申请实施例,下面将对实施例中所需要使用的附图做简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present application more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present application. As far as people are concerned, other drawings can also be obtained based on these drawings on the premise of not paying creative work.

图1为本申请所提供的一种分层缓存方法的步骤流程图;Fig. 1 is a flow chart of the steps of a layered caching method provided by the present application;

图2为本申请所提供的一种分层缓存系统的体系结构示意图;FIG. 2 is a schematic structural diagram of a layered cache system provided by the present application;

图3为本申请所提供的一种分布式存储系统分层缓存框架示意图;FIG. 3 is a schematic diagram of a hierarchical cache framework of a distributed storage system provided by the present application;

图4为本申请所提供的一种分层缓存系统的结构示意图。FIG. 4 is a schematic structural diagram of a hierarchical caching system provided by the present application.

具体实施方式Detailed ways

本申请的核心是提供一种分层缓存方法、系统及相关组件,能够提高海量小文件数据集的IO性能,改善高并发元数据密集型文件系统业务中元数据造成的性能瓶颈。The core of this application is to provide a layered cache method, system and related components, which can improve the IO performance of massive small file data sets, and improve the performance bottleneck caused by metadata in high-concurrency metadata-intensive file system services.

为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

第一方面,请参照图1,图1为本申请所提供的一种分层缓存方法的步骤流程图,该分层缓存方法包括:In the first aspect, please refer to FIG. 1. FIG. 1 is a flow chart of the steps of a layered caching method provided by the present application. The layered caching method includes:

S101:利用客户端进程监测客户端向分布式存储系统发出的文件IO操作请求,当监测到文件IO操作请求,将文件IO操作请求重定向到服务端进程;S101: Use the client process to monitor the file IO operation request sent by the client to the distributed storage system, and when the file IO operation request is detected, redirect the file IO operation request to the server process;

本实施例提出了一种聚合缓存层,该聚合缓存层是一种透明的只读缓存层,针对典型的密集型IO模型

Figure SMS_2
构建缓存,通过聚合分布式集群节点本地和相邻节点本地存储来加速读取存储数据性能,以提高海量小文件数据集的IO性能。This embodiment proposes an aggregated cache layer, which is a transparent read-only cache layer for typical intensive IO models
Figure SMS_2
Build a cache to accelerate the performance of reading and storing data by aggregating the local storage of distributed cluster nodes and adjacent nodes, so as to improve the IO performance of massive small file data sets.

客户端通过分布式存储系统提供的POSIX文件系统接口访问分布式存储系统,以便加速HPC高性能场景的存储IO访问性能,该场景具有高重读率的只读数据,具有典型的密集型IO模型

Figure SMS_3
特点。参照图2所示,本申请所提供的分层缓存系统的体系结构,由两个主要组件组成:分层缓存客户端进程和分层缓存服务端进程,这两个组件部署在分布式存储系统的客户端层,在HPC上的一组计算节点上分配作业时,启动分层缓存服务端进程,利用分布式存储节点和相邻节点本地存储动态构建服务端进程实例。分布式存储系统的每个节点均部署分层缓存客户端进程和服务端进程,该进程会把HPC高性能计算作业请求的数据缓存到本节点的Nvme SSD高速存储介质设备上。The client accesses the distributed storage system through the POSIX file system interface provided by the distributed storage system, so as to accelerate the storage IO access performance of HPC high-performance scenarios. This scenario has read-only data with a high re-read rate and has a typical intensive IO model.
Figure SMS_3
features. Referring to Figure 2, the architecture of the layered cache system provided by this application consists of two main components: a layered cache client process and a layered cache server process, these two components are deployed in the distributed storage system In the client layer, when jobs are assigned to a group of computing nodes on HPC, a layered cache server process is started, and a server process instance is dynamically constructed using distributed storage nodes and local storage of adjacent nodes. Each node of the distributed storage system deploys a hierarchical cache client process and server process, which will cache the data requested by the HPC high-performance computing job to the Nvme SSD high-speed storage media device of the node.

具体的,对本申请的分层缓存流程进行说明,参照图3所示,首先预加载客户端进程,利用客户端进程监控并拦截文件系统操作,如open、read和close,文件系统调用是通过该进程截获的,不需要对现有高性能计算应用程序或分布式存储系统底层文件系统进行修改。可以理解的是,分层缓存客户端进程由一个文件系统IO接口转发模块组成,该接口捕获对分布式存储系统的文件系统调用,并重定向到相应的分层缓存服务端进程,因此,分层缓存客户端进程首先经过聚合缓存层读取命中数据,有助于高性能业务性能需求。Specifically, the layered caching process of the present application is described, as shown in FIG. 3 , first, the client process is preloaded, and the client process is used to monitor and intercept file system operations, such as open, read, and close. File system calls are made through the Process interception does not need to modify the underlying file system of existing high-performance computing applications or distributed storage systems. It can be understood that the layered cache client process is composed of a file system IO interface forwarding module, which captures the file system calls to the distributed storage system and redirects them to the corresponding layered cache server process. Therefore, the layered cache The cache client process first reads hit data through the aggregation cache layer, which is helpful for high-performance business performance requirements.

S102:利用服务端进程判断文件IO操作请求对应的目标存储位置是否为聚合缓存层,若否,执行S103,若是,执行S104;S102: Use the server process to determine whether the target storage location corresponding to the file IO operation request is the aggregation cache layer, if not, execute S103, and if so, execute S104;

S103:从分布式存储系统底层读取文件IO操作请求对应的数据,并将数据缓存到聚合缓存层;S103: Read the data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and cache the data to the aggregation cache layer;

S104:从聚合缓存层中读取数据,并将数据返回到客户端进程,以便客户端进程将数据返回至客户端。S104: Read data from the aggregation cache layer, and return the data to the client process, so that the client process returns the data to the client.

服务端进程在接收到客户端进程拦截的文件IO操作请求后,仅在缓存未命中时,才会从分布式存储系统底层检索文件,可以理解的是,下面分别针对两种读取场景进行说明,两种读取场景包括首次读取和非首次读取。After the server process receives the file IO operation request intercepted by the client process, it will only retrieve the file from the bottom layer of the distributed storage system when the cache misses. It is understandable that the following two reading scenarios are explained separately. , the two reading scenarios include first-time reading and non-first-time reading.

对于首次读取:For first read:

HPC高性能场景的计算节点客户端对分布式存储系统上的数据集目录发起读取请求,分层缓存客户端进程截获任何传入的文件IO操作请求,并开始在数据集目录中进行跟踪,分层缓存客户端进程的RPC(远程过程调用)处理程序将请求的文件IO操作请求重定向到相应的分层缓存服务端进程,分层缓存客户端进程和服务端进程的内部管理RPC处理程序,负责通过网络发送和接收消息。The computing node client in the HPC high-performance scenario initiates a read request to the dataset directory on the distributed storage system, and the hierarchical cache client process intercepts any incoming file IO operation requests and starts tracking in the dataset directory. The RPC (remote procedure call) handler of the layered cache client process redirects the requested file IO operation request to the corresponding layered cache server process, and the internal management RPC handler of the layered cache client process and server process , which is responsible for sending and receiving messages over the network.

作为一种可选的实施例,利用服务端进程判断文件IO操作请求对应的目标存储位置是否为聚合缓存层的过程包括:利用服务端进程将接收到的文件IO操作请求插入到共享队列中;在共享队列中,确定文件IO操作请求对应的数据是否为已缓存数据;若是,则判定文件IO操作请求对应的目标存储位置为聚合缓存层。As an optional embodiment, the process of using the server process to determine whether the target storage location corresponding to the file IO operation request is an aggregation cache layer includes: using the server process to insert the received file IO operation request into the shared queue; In the shared queue, it is determined whether the data corresponding to the file IO operation request is cached data; if so, it is determined that the target storage location corresponding to the file IO operation request is the aggregation cache layer.

作为一种可选的实施例,在共享队列中,确定文件IO操作请求对应的数据是否为已缓存数据的过程包括:在共享队列中,通过数据线程确定文件IO操作请求对应的数据是否为已缓存数据;数据线程为构建服务端进程实例时生成的线程。As an optional embodiment, in the shared queue, the process of determining whether the data corresponding to the file IO operation request is cached data includes: in the shared queue, determining whether the data corresponding to the file IO operation request is cached data through a data thread Cache data; the data thread is the thread generated when the server process instance is constructed.

具体的,当分层缓存Server进程收到文件IO操作请求时,RPC处理程序会将转发的文件IO插入到共享FIFO(First Input First Output,先入先出队列)队列,在共享FIFO队列中,通过move-data数据线程检查文件是否已缓存,由于是首次读取文件,需要将数据拉取到聚合缓存中,确定缓存的文件描述符、读取偏移量和长度。Specifically, when the hierarchical cache server process receives a file IO operation request, the RPC handler will insert the forwarded file IO into the shared FIFO (First Input First Output, first-in-first-out queue) queue. In the shared FIFO queue, pass The move-data data thread checks whether the file has been cached. Since it is the first time to read the file, it needs to pull the data into the aggregation cache to determine the cached file descriptor, read offset, and length.

对于非首次读取:For non-first reads:

作为一种可选的实施例,从聚合缓存层中读取数据的过程包括:As an optional embodiment, the process of reading data from the aggregation cache layer includes:

通过数据线程将文件IO操作请求重定向至聚合缓存层,以便从聚合缓存层读取数据。Redirect the file IO operation request to the aggregation cache layer through the data thread, so as to read data from the aggregation cache layer.

具体的,数据线程将IO重定向到聚合缓存中读取文件,将文件描述符返回到相应的分层缓存Client进程,然后,该进程将文件描述符、读取偏移量和长度等数据返回给HPC应用程序,即HPC高性能客户端请求端。文件IO操作请求可见,本实施例中,客户端进程监测到文件IO操作请求后,将文件IO操作请求重定向到服务端进程,服务端进程先在聚合缓存层中检索文件,在聚合缓存层未命中时,再从分布式存储系统底层检索数据,以提高海量小文件数据集的IO性能,改善高并发元数据密集型文件系统业务中元数据造成的性能瓶颈。Specifically, the data thread redirects the IO to the aggregation cache to read the file, returns the file descriptor to the corresponding hierarchical cache Client process, and then the process returns the file descriptor, read offset and length and other data For HPC applications, that is, the HPC high-performance client request side. The file IO operation request is visible. In this embodiment, after the client process detects the file IO operation request, it redirects the file IO operation request to the server process. The server process first retrieves the file in the aggregation cache layer, and in the aggregation cache layer When there is a miss, the data is retrieved from the bottom layer of the distributed storage system to improve the IO performance of massive small file datasets and improve the performance bottleneck caused by metadata in the highly concurrent metadata-intensive file system business.

可见,本实施例中,客户端进程监测到文件IO操作请求后,将文件IO操作请求重定向到服务端进程,服务端进程先在聚合缓存层中检索文件,在聚合缓存层未命中时,再从分布式存储系统底层检索数据,以提高海量小文件数据集的IO性能,改善高并发元数据密集型文件系统业务中元数据造成的性能瓶颈。It can be seen that in this embodiment, after the client process detects the file IO operation request, the file IO operation request is redirected to the server process, and the server process first retrieves the file in the aggregation cache layer, and when the aggregation cache layer misses, Then retrieve data from the bottom layer of the distributed storage system to improve the IO performance of massive small file datasets and improve the performance bottleneck caused by metadata in the high-concurrency metadata-intensive file system business.

在上述实施例的基础上:On the basis of above-mentioned embodiment:

作为一种可选的实施例,利用服务端进程将接收到的文件IO操作请求插入到共享队列中的同时,该分层缓存方法还包括:As an optional embodiment, when using the server process to insert the received file IO operation request into the shared queue, the layered caching method also includes:

将共享队列配置互斥锁。Configure the shared queue with a mutex.

具体的,考虑到多个分层缓存客户端进程会同时请求单个文件,因此在共享队列上使用互斥锁来保证一致性并避免重复将文件复制到聚合缓存中。Specifically, considering that multiple hierarchical cache client processes will request a single file at the same time, a mutex is used on the shared queue to ensure consistency and avoid duplicative copying of files into the aggregated cache.

作为一种可选的实施例,分层缓存方法还包括:As an optional embodiment, the hierarchical caching method further includes:

基于文件路径和所属计算节点确定数据在聚合缓存层中的存储位置。Determine the storage location of the data in the aggregation cache layer based on the file path and the computing node to which it belongs.

作为一种可选的实施例,分层缓存方法还包括:As an optional embodiment, the hierarchical caching method further includes:

将文件IO操作请求广播给与本计算节点相邻的计算节点。Broadcast the file IO operation request to the computing nodes adjacent to this computing node.

具体的,分层缓存进程将查找文件的请求广播到相邻的节点上,有助于平衡节点之间的负载压力。Specifically, the layered cache process broadcasts the file search request to adjacent nodes, which helps to balance the load pressure among the nodes.

作为一种可选的实施例,分层缓存方法还包括:As an optional embodiment, the hierarchical caching method further includes:

判断文件IO操作请求对应的数据集是否大于本地存储介质的总容量;Determine whether the data set corresponding to the file IO operation request is greater than the total capacity of the local storage medium;

若是,执行缓存逐出操作和替换操作。If so, perform cache eviction and replacement operations.

具体的,聚合缓存淘汰机制根据重复读取数据集作业相关,如果数据集大于节点本地缓存的总容量,则执行缓存逐出和替换。Specifically, the aggregation cache elimination mechanism is related to the repeated read data set job. If the data set is larger than the total capacity of the node's local cache, cache eviction and replacement will be performed.

作为一种可选的实施例,分层缓存方法还包括:As an optional embodiment, the hierarchical caching method further includes:

基于环境变量构建动态链接库;动态链接库用于拦截文件IO操作请求。Build a dynamic link library based on environment variables; the dynamic link library is used to intercept file IO operation requests.

具体的,本实施例制定重定向only read聚合缓存,即截获read IO机制,基于分布式存储系统高性能场景作业的初始原型,本实施例聚合缓存层有助于分析高性能业务场景,特别是三个关键要素中的IO读取调用,以了解框架中的数据加载器如何访问文件。本实施例设计聚合缓存层用于拦截IO的相关函数调用,采用有选择性的载入不同动态链接库中的相同函数机制构建,该机制避免了强制应用程序修改其代码库以支持聚合缓存层的的必要性。Specifically, this embodiment formulates the redirection only read aggregation cache, that is, intercepts the read IO mechanism, based on the initial prototype of the distributed storage system high-performance scene operation, the aggregation cache layer of this embodiment is helpful for analyzing high-performance business scenarios, especially The IO read calls in the three key elements to understand how the data loader in the framework accesses the file. In this embodiment, the aggregation cache layer is designed to intercept IO-related function calls, and is constructed by selectively loading the same function mechanism in different dynamic link libraries. This mechanism avoids forcing applications to modify their code bases to support the aggregation cache layer the necessity of.

具体的,本实施例的截获read IO机制,重定向到动态链接库Only_Read_Performance.so,该动态链接库的技术优化点在于一旦动态库的函数发生变化,对于可执行程序来说是透明的,可执行程序无需重新编译。而对于其他技术中的静态链接的程序来说,函数库中一个小小的改动需要整个程序的重新编译、发布。其中,静态链接就是把所有所引用的函数或变量全部地编译到可执行文件中。动态链接则不会把函数编译到可执行文件中,而是在程序运行时动态的载入函数库,也就是运行链接。因此,重定向到动态链接库Only_Read_Performance.so,具有兼容性和可移植性,对分布式存储系统具有重要价值。Specifically, the intercepted read IO mechanism of this embodiment is redirected to the dynamic link library Only_Read_Performance.so. The technical optimization point of the dynamic link library is that once the function of the dynamic library changes, it is transparent to the executable program and can The executable program does not need to be recompiled. For statically linked programs in other technologies, a small change in the function library requires recompilation and release of the entire program. Among them, the static link is to compile all the referenced functions or variables into the executable file. Dynamic linking does not compile functions into executable files, but dynamically loads the function library when the program is running, that is, running the link. Therefore, the redirection to the dynamic link library Only_Read_Performance.so has compatibility and portability, and is of great value to the distributed storage system.

截获read IO机制的具体实施例步骤如下:The specific embodiment steps of intercepting the read IO mechanism are as follows:

(1)针对HPC高性能计算作业客户端请求,且满足标准POSIX语义的文件系统请求,符合典型的密集型IO模型

Figure SMS_4
调用来访问底层分布式存储文件系统;(1) For HPC high-performance computing job client requests, and file system requests that meet standard POSIX semantics, in line with typical intensive IO models
Figure SMS_4
Call to access the underlying distributed storage file system;

(2)使用分布式存储系统Linux服务器的环境变量LD_PRELOAD,其特点是动态库加载,且优先级最高,是本申请截获read请求处理逻辑的实施例方法;(2) Use the environment variable LD_PRELOAD of the Linux server of the distributed storage system, which is characterized by dynamic library loading and has the highest priority, which is the embodiment method of this application to intercept the read request processing logic;

(3)截获read IO机制的Input:(3) Intercept the Input of the read IO mechanism:

a.

Figure SMS_5
文件系统调用;a.
Figure SMS_5
file system calls;

b.LD_PRELOAD环境变量;b. LD_PRELOAD environment variable;

c.在本地聚合缓存层的动态链接库,记为Only_Read_Performance.so;c. The dynamic link library of the local aggregation cache layer is recorded as Only_Read_Performance.so;

(4)截获read IO机制的Output:(4) Intercept the Output of the read IO mechanism:

执行Only_Read_Performance.so,如上述步骤,在缓存聚合层进行读缓存逻辑处理。Execute Only_Read_Performance.so, and perform read cache logic processing at the cache aggregation layer as described above.

作为一种可选的实施例,将文件IO操作请求重定向到服务端进程的过程包括:As an optional embodiment, the process of redirecting the file IO operation request to the server process includes:

通过哈希算法将文件IO操作请求重定向到服务端进程。Redirect the file IO operation request to the server process through the hash algorithm.

本实施例通过哈希算法进行重定向,可以避免元数据查找瓶颈,目标是提高随机读取性能。分层缓存Client进程使用Hash重定向IO,在分层缓存Server进程上查找缓存,目的是不在分布式元数据存储或内存中数据库来存储缓存的文件元数据。在聚合缓存中,文件缓存位置是使用文件路径和所属节点确定的。In this embodiment, the hash algorithm is used to perform redirection, which can avoid metadata search bottlenecks, and aims to improve random read performance. The hierarchical cache client process uses Hash to redirect IO and looks up the cache on the hierarchical cache server process, so as not to store the cached file metadata in the distributed metadata storage or in-memory database. In an aggregated cache, the file cache location is determined using the file path and owning node.

作为一种可选的实施例,聚合缓存层为由分布式存储系统中各个计算节点中的高速存储介质构成的缓存层。As an optional embodiment, the aggregation cache layer is a cache layer composed of high-speed storage media in each computing node in the distributed storage system.

作为一种可选的实施例,高速存储介质为Nvme(Non-Volatile Memory express,非易失性内存主机控制器接口规范) SSD(Solid State Disk,固态硬盘)。As an optional embodiment, the high-speed storage medium is Nvme (Non-Volatile Memory express, non-volatile memory host controller interface specification) SSD (Solid State Disk, solid state disk).

作为一种可选的实施例,分层缓存方法还包括:As an optional embodiment, the hierarchical caching method further includes:

当满足清除条件,清除本计算节点上中高速存储介质中存储的数据。When the clearing condition is met, clear the data stored in the high-speed storage medium on the computing node.

本实施例中,缓存中数据集的生命周期与HPC高性能客户端上作业的生命周期耦合,作业完成后,将从节点本地存储中清除缓存的数据集。In this embodiment, the life cycle of the data set in the cache is coupled with the life cycle of the job on the HPC high-performance client. After the job is completed, the cached data set will be cleared from the local storage of the node.

综上所述,本申请所提出的一种聚合缓存层与分布式存储系统的元数据模块与数据模块存储机制相互独立,是一种透明的只读缓存层,针对典型的密集型IO模型

Figure SMS_6
构建缓存,基于RPC远程过程调用机制,采用聚合分布式集群节点本地和相邻节点本地存储来加速读取存储数据性能,以提高海量小文件数据集的IO性能;采用分布式hash算法IO重定向来确定数据请求的缓存位置,基于LD_PRELOAD环境变量设计最高优先级动态链接库Only_Read_Performance.so用以截获Read IO请求,避免对分布式存储系统元数据查找造成瓶颈,目标是提高随机读取性能;同时提高分布式存储系统缓存命中率,有效避免元数据查找和文件锁竞争对存储系统IO带来的性能瓶颈,与此同时,由于本申请的聚合缓存层与分布式存储系统相互独立,因此也具有可移植性和通用性的特点。To sum up, the aggregation cache layer proposed in this application is independent of the metadata module and data module storage mechanism of the distributed storage system. It is a transparent read-only cache layer and is aimed at typical intensive IO models.
Figure SMS_6
Construct a cache, based on the RPC remote procedure call mechanism, use the aggregated distributed cluster node local and adjacent node local storage to speed up the performance of reading and storing data, so as to improve the IO performance of massive small file data sets; use distributed hash algorithm IO redirection To determine the cache location of the data request, the highest priority dynamic link library Only_Read_Performance.so is designed based on the LD_PRELOAD environment variable to intercept the Read IO request, avoiding the bottleneck caused by the metadata search of the distributed storage system, and the goal is to improve the random read performance; at the same time Improve the cache hit rate of the distributed storage system, effectively avoid the performance bottleneck caused by metadata search and file lock competition on the storage system IO. At the same time, because the aggregation cache layer of this application is independent of the distributed storage system, it also has Features of portability and versatility.

在性能方面,本申请有效提高HPC高性能场景高并发IO模型的性能问题,改善高并发元数据密集型文件系统业务中元数据造成的性能瓶颈;在稳定性方面,本申请设计缓存聚合层独立于分布式存储底层系统,出现缓存聚合层存储介质故障情况,不影响业务正常调用,具有稳定性;在安全性方面,分层缓存架构与分布式存储系统松耦合,无安全性风险;在成本方面,本申请改善HPC高性能场景的普遍性存储IO性能问题,可提高分布式文件存储的竞争力和维护成本;在兼容性方面,本申请具有可移植性和通用性,且无须修改HPC作业应用程序,提高分布式集群线性扩展性,并且兼容常见的文件配额,快照等特性。In terms of performance, this application effectively improves the performance of the high-concurrency IO model in HPC high-performance scenarios, and improves the performance bottleneck caused by metadata in high-concurrency metadata-intensive file system services; in terms of stability, this application designs the cache aggregation layer independently In the underlying system of distributed storage, if there is a failure of the cache aggregation layer storage medium, it will not affect the normal call of the business and has stability; in terms of security, the hierarchical cache architecture is loosely coupled with the distributed storage system, and there is no security risk; in terms of cost On the one hand, this application improves the universal storage IO performance problem of HPC high-performance scenarios, which can improve the competitiveness and maintenance cost of distributed file storage; on the other hand, this application has portability and versatility, and does not need to modify HPC operations Applications, improve the linear scalability of distributed clusters, and are compatible with common file quotas, snapshots and other features.

第二方面,请参照图4,图4为本申请所提供的一种分层缓存系统的结构示意图,应用于分布式存储系统的每一计算节点,分层缓存系统包括:For the second aspect, please refer to FIG. 4. FIG. 4 is a schematic structural diagram of a layered cache system provided by the present application, which is applied to each computing node of the distributed storage system. The layered cache system includes:

监测模块41,用于利用客户端进程监测客户端向分布式存储系统发出的文件IO操作请求,当监测到文件IO操作请求,将文件IO操作请求重定向到服务端进程;The monitoring module 41 is used to monitor the file IO operation request sent by the client to the distributed storage system by using the client process, and when the file IO operation request is detected, the file IO operation request is redirected to the server process;

处理模块42,用于利用服务端进程判断文件IO操作请求对应的目标存储位置是否为聚合缓存层,若否,触发第一读取模块43,若是,触发第二读取模块44;The processing module 42 is used to utilize the server process to judge whether the target storage location corresponding to the file IO operation request is an aggregation cache layer, if not, trigger the first reading module 43, and if so, trigger the second reading module 44;

第一读取模块43,用于从分布式存储系统底层读取文件IO操作请求对应的数据,并将数据缓存到聚合缓存层;The first reading module 43 is used to read the data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and cache the data to the aggregation cache layer;

第二读取模块44,用于从聚合缓存层中读取数据,并将数据返回到客户端进程,以便客户端进程将数据返回至客户端。The second reading module 44 is configured to read data from the aggregation cache layer and return the data to the client process, so that the client process returns the data to the client.

本实施例提出了一种聚合缓存层,该聚合缓存层是一种透明的只读缓存层,针对典型的密集型IO模型

Figure SMS_7
构建缓存,通过聚合分布式集群节点本地和相邻节点本地存储来加速读取存储数据性能,以提高海量小文件数据集的IO性能。This embodiment proposes an aggregated cache layer, which is a transparent read-only cache layer for typical intensive IO models
Figure SMS_7
Build a cache to accelerate the performance of reading and storing data by aggregating the local storage of distributed cluster nodes and adjacent nodes, so as to improve the IO performance of massive small file data sets.

客户端通过分布式存储系统提供的POSIX文件系统接口访问分布式存储系统,以便加速HPC高性能场景的存储IO访问性能,该场景具有高重读率的只读数据,具有典型的密集型IO模型

Figure SMS_8
特点。参照图2所示,本申请所提供的分层缓存系统的体系结构,由两个主要组件组成:分层缓存客户端进程和分层缓存服务端进程,这两个组件部署在分布式存储系统的客户端层,在HPC上的一组计算节点上分配作业时,启动分层缓存服务端进程,利用分布式存储节点和相邻节点本地存储动态构建服务端进程实例。分布式存储系统的每个节点均部署分层缓存客户端进程和服务端进程,该进程会把HPC高性能计算作业请求的数据缓存到本节点的Nvme SSD高速存储介质设备上。The client accesses the distributed storage system through the POSIX file system interface provided by the distributed storage system, so as to accelerate the storage IO access performance of HPC high-performance scenarios. This scenario has read-only data with a high re-read rate and has a typical intensive IO model.
Figure SMS_8
features. Referring to Figure 2, the architecture of the layered cache system provided by this application consists of two main components: a layered cache client process and a layered cache server process, these two components are deployed in the distributed storage system In the client layer, when jobs are assigned to a group of computing nodes on HPC, a layered cache server process is started, and a server process instance is dynamically constructed using distributed storage nodes and local storage of adjacent nodes. Each node of the distributed storage system deploys a hierarchical cache client process and server process, which will cache the data requested by the HPC high-performance computing job to the Nvme SSD high-speed storage media device of the node.

具体的,对本申请的分层缓存流程进行说明,参照图3所示,首先预加载客户端进程,利用客户端进程监控并拦截文件系统操作,如open、read和close,文件系统调用是通过该进程截获的,不需要对现有高性能计算应用程序或分布式存储系统底层文件系统进行修改。可以理解的是,分层缓存客户端进程由一个文件系统IO接口转发模块组成,该接口捕获对分布式存储系统的文件系统调用,并重定向到相应的分层缓存服务端进程,因此,分层缓存客户端进程首先经过聚合缓存层读取命中数据,有助于高性能业务性能需求。Specifically, the layered caching process of the present application is described, as shown in FIG. 3 , first, the client process is preloaded, and the client process is used to monitor and intercept file system operations, such as open, read, and close. File system calls are made through the Process interception does not need to modify the underlying file system of existing high-performance computing applications or distributed storage systems. It can be understood that the layered cache client process is composed of a file system IO interface forwarding module, which captures the file system calls to the distributed storage system and redirects them to the corresponding layered cache server process. Therefore, the layered cache The cache client process first reads hit data through the aggregation cache layer, which is helpful for high-performance business performance requirements.

服务端进程在接收到客户端进程拦截的文件IO操作请求后,仅在缓存未命中时,才会从分布式存储系统底层检索文件,可以理解的是,下面分别针对两种读取场景进行说明,两种读取场景包括首次读取和非首次读取。After the server process receives the file IO operation request intercepted by the client process, it will only retrieve the file from the bottom layer of the distributed storage system when the cache misses. It is understandable that the following two reading scenarios are explained separately. , the two reading scenarios include first-time reading and non-first-time reading.

对于首次读取:For first read:

HPC高性能场景的计算节点客户端对分布式存储系统上的数据集目录发起读取请求,分层缓存客户端进程截获任何传入的文件IO操作请求,并开始在数据集目录中进行跟踪,分层缓存客户端进程的RPC(远程过程调用)处理程序将请求的文件IO操作请求重定向到相应的分层缓存服务端进程,分层缓存客户端进程和服务端进程的内部管理RPC处理程序,负责通过网络发送和接收消息。The computing node client in the HPC high-performance scenario initiates a read request to the dataset directory on the distributed storage system, and the hierarchical cache client process intercepts any incoming file IO operation requests and starts tracking in the dataset directory. The RPC (remote procedure call) handler of the layered cache client process redirects the requested file IO operation request to the corresponding layered cache server process, and the internal management RPC handler of the layered cache client process and server process , which is responsible for sending and receiving messages over the network.

作为一种可选的实施例,利用服务端进程判断文件IO操作请求对应的目标存储位置是否为聚合缓存层的过程包括:利用服务端进程将接收到的文件IO操作请求插入到共享队列中;在共享队列中,确定文件IO操作请求对应的数据是否为已缓存数据;若是,则判定文件IO操作请求对应的目标存储位置为聚合缓存层。作为一种可选的实施例,在共享队列中,确定文件IO操作请求对应的数据是否为已缓存数据的过程包括:在共享队列中,通过数据线程确定文件IO操作请求对应的数据是否为已缓存数据;数据线程为构建服务端进程实例时生成的线程。As an optional embodiment, the process of using the server process to determine whether the target storage location corresponding to the file IO operation request is an aggregation cache layer includes: using the server process to insert the received file IO operation request into the shared queue; In the shared queue, it is determined whether the data corresponding to the file IO operation request is cached data; if so, it is determined that the target storage location corresponding to the file IO operation request is the aggregation cache layer. As an optional embodiment, in the shared queue, the process of determining whether the data corresponding to the file IO operation request is cached data includes: in the shared queue, determining whether the data corresponding to the file IO operation request is cached data through a data thread Cache data; the data thread is the thread generated when the server process instance is constructed.

具体的,当分层缓存Server进程收到文件IO操作请求时,RPC处理程序会将转发的文件IO插入到共享FIFO队列,在共享FIFO队列中,通过move-data数据线程检查文件是否已缓存,由于是首次读取文件,需要将数据拉取到聚合缓存中,确定缓存的文件描述符、读取偏移量和长度。Specifically, when the hierarchical cache server process receives a file IO operation request, the RPC handler will insert the forwarded file IO into the shared FIFO queue, and in the shared FIFO queue, check whether the file has been cached through the move-data data thread. Since it is the first time to read a file, the data needs to be pulled into the aggregation cache to determine the cached file descriptor, read offset, and length.

对于非首次读取:For non-first reads:

作为一种可选的实施例,从聚合缓存层中读取数据的过程包括:As an optional embodiment, the process of reading data from the aggregation cache layer includes:

通过数据线程将文件IO操作请求重定向至聚合缓存层,以便从聚合缓存层读取数据。Redirect the file IO operation request to the aggregation cache layer through the data thread, so as to read data from the aggregation cache layer.

具体的,数据线程将IO重定向到聚合缓存中读取文件,将文件描述符返回到相应的分层缓存Client进程,然后,该进程将文件描述符、读取偏移量和长度等数据返回给HPC应用程序,即HPC高性能客户端请求端。文件IO操作请求可见,本实施例中,客户端进程监测到文件IO操作请求后,将文件IO操作请求重定向到服务端进程,服务端进程先在聚合缓存层中检索文件,在聚合缓存层未命中时,再从分布式存储系统底层检索数据,以提高海量小文件数据集的IO性能,改善高并发元数据密集型文件系统业务中元数据造成的性能瓶颈。Specifically, the data thread redirects the IO to the aggregation cache to read the file, returns the file descriptor to the corresponding hierarchical cache Client process, and then the process returns the file descriptor, read offset and length and other data For HPC applications, that is, the HPC high-performance client request side. The file IO operation request is visible. In this embodiment, after the client process detects the file IO operation request, it redirects the file IO operation request to the server process. The server process first retrieves the file in the aggregation cache layer, and in the aggregation cache layer When there is a miss, the data is retrieved from the bottom layer of the distributed storage system to improve the IO performance of massive small file datasets and improve the performance bottleneck caused by metadata in the highly concurrent metadata-intensive file system business.

可见,本实施例中,客户端进程监测到文件IO操作请求后,将文件IO操作请求重定向到服务端进程,服务端进程先在聚合缓存层中检索文件,在聚合缓存层未命中时,再从分布式存储系统底层检索数据,以提高海量小文件数据集的IO性能,改善高并发元数据密集型文件系统业务中元数据造成的性能瓶颈。It can be seen that in this embodiment, after the client process detects the file IO operation request, the file IO operation request is redirected to the server process, and the server process first retrieves the file in the aggregation cache layer, and when the aggregation cache layer misses, Then retrieve data from the bottom layer of the distributed storage system to improve the IO performance of massive small file datasets and improve the performance bottleneck caused by metadata in the high-concurrency metadata-intensive file system business.

作为一种可选的实施例,利用服务端进程判断文件IO操作请求对应的目标存储位置是否为聚合缓存层的过程包括:As an optional embodiment, the process of using the server process to determine whether the target storage location corresponding to the file IO operation request is the aggregation cache layer includes:

利用服务端进程将接收到的文件IO操作请求插入到共享队列中;Use the server process to insert the received file IO operation request into the shared queue;

在共享队列中,确定文件IO操作请求对应的数据是否为已缓存数据;In the shared queue, determine whether the data corresponding to the file IO operation request is cached data;

若是,则判定文件IO操作请求对应的目标存储位置为聚合缓存层。If yes, it is determined that the target storage location corresponding to the file IO operation request is the aggregation cache layer.

作为一种可选的实施例,在共享队列中,确定文件IO操作请求对应的数据是否为已缓存数据的过程包括:As an optional embodiment, in the shared queue, the process of determining whether the data corresponding to the file IO operation request is cached data includes:

在共享队列中,通过数据线程确定文件IO操作请求对应的数据是否为已缓存数据;数据线程为构建服务端进程实例时生成的线程。In the shared queue, the data thread is used to determine whether the data corresponding to the file IO operation request is cached data; the data thread is the thread generated when the server process instance is constructed.

作为一种可选的实施例,分层缓存系统还包括:As an optional embodiment, the hierarchical caching system further includes:

预处理模块,用于当监测客户端向分布式存储系统发出的文件IO操作请求,启动服务端进程,利用本计算节点的状态和与本计算节点相邻的其他计算节点的状态动态构建服务端进程实例。The preprocessing module is used to monitor the file IO operation request sent by the client to the distributed storage system, start the server process, and dynamically build the server by using the status of the computing node and the status of other computing nodes adjacent to the computing node process instance.

作为一种可选的实施例,从聚合缓存层中读取数据的过程包括:As an optional embodiment, the process of reading data from the aggregation cache layer includes:

通过数据线程将文件IO操作请求重定向至聚合缓存层,以便从聚合缓存层读取数据。Redirect the file IO operation request to the aggregation cache layer through the data thread, so as to read data from the aggregation cache layer.

作为一种可选的实施例,利用服务端进程将接收到的文件IO操作请求插入到共享队列中的同时,该分层缓存系统还包括:As an optional embodiment, while using the server process to insert the received file IO operation request into the shared queue, the hierarchical caching system also includes:

配置模块,用于将共享队列配置互斥锁。The configuration module is used to configure a mutual exclusion lock for a shared queue.

作为一种可选的实施例,共享队列为FIFO队列。As an optional embodiment, the shared queue is a FIFO queue.

作为一种可选的实施例,文件IO操作请求对应的数据包括文件描述符、读取偏移量和长度。As an optional embodiment, the data corresponding to the file IO operation request includes a file descriptor, a read offset, and a length.

作为一种可选的实施例,分层缓存系统还包括:As an optional embodiment, the hierarchical caching system further includes:

确定模块,用于基于文件路径和所属计算节点确定数据在聚合缓存层中的存储位置。The determination module is configured to determine the storage location of the data in the aggregation cache layer based on the file path and the computing node to which it belongs.

作为一种可选的实施例,分层缓存系统还包括:As an optional embodiment, the hierarchical caching system further includes:

广播模块,用于将文件IO操作请求广播给与本计算节点相邻的计算节点。The broadcast module is configured to broadcast the file IO operation request to computing nodes adjacent to the computing node.

作为一种可选的实施例,分层缓存系统还包括:As an optional embodiment, the hierarchical caching system further includes:

判断模块,用于判断文件IO操作请求对应的数据集是否大于本地存储介质的总容量,若是,执行缓存逐出操作和替换操作。A judging module, configured to judge whether the data set corresponding to the file IO operation request is larger than the total capacity of the local storage medium, and if so, perform a cache eviction operation and a replacement operation.

作为一种可选的实施例,分层缓存系统还包括:As an optional embodiment, the hierarchical caching system further includes:

构建模块,用于基于环境变量构建动态链接库;动态链接库用于拦截文件IO操作请求。The building module is used to build a dynamic link library based on environment variables; the dynamic link library is used to intercept file IO operation requests.

作为一种可选的实施例,将文件IO操作请求重定向到服务端进程的过程包括:As an optional embodiment, the process of redirecting the file IO operation request to the server process includes:

通过哈希算法将文件IO操作请求重定向到服务端进程。Redirect the file IO operation request to the server process through the hash algorithm.

作为一种可选的实施例,聚合缓存层为由分布式存储系统中各个计算节点中的高速存储介质构成的缓存层。As an optional embodiment, the aggregation cache layer is a cache layer composed of high-speed storage media in each computing node in the distributed storage system.

作为一种可选的实施例,高速存储介质为Nvme SSD。As an optional embodiment, the high-speed storage medium is Nvme SSD.

作为一种可选的实施例,分层缓存系统还包括:As an optional embodiment, the hierarchical caching system further includes:

清除模块,用于当满足清除条件,清除本计算节点上中高速存储介质中存储的数据。The clearing module is used to clear the data stored in the medium and high-speed storage medium on the computing node when the clearing condition is met.

第三方面,本申请还提供了一种电子设备,包括:In a third aspect, the present application also provides an electronic device, including:

存储器,用于存储计算机程序;memory for storing computer programs;

处理器,用于执行计算机程序时实现如上文任意一项的分层缓存方法的步骤。A processor, configured to implement the steps of any one of the above hierarchical cache methods when executing a computer program.

具体的,存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机可读指令,该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。处理器为电子设备提供计算和控制能力,执行存储器中保存的计算机程序时,可以实现以下步骤:利用客户端进程监测客户端向分布式存储系统发出的文件IO操作请求,当监测到文件IO操作请求,将文件IO操作请求重定向到服务端进程;利用服务端进程判断文件IO操作请求对应的目标存储位置是否为聚合缓存层;若否,从分布式存储系统底层读取文件IO操作请求对应的数据,并将数据缓存到聚合缓存层;若是,从聚合缓存层中读取数据,并将数据返回到客户端进程,以便客户端进程将数据返回至客户端。Specifically, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer-readable instructions, and the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The processor provides computing and control capabilities for the electronic device. When executing the computer program stored in the memory, the following steps can be implemented: use the client process to monitor the file IO operation request sent by the client to the distributed storage system, and when the file IO operation is detected Request, redirect the file IO operation request to the server process; use the server process to determine whether the target storage location corresponding to the file IO operation request is the aggregated cache layer; if not, read the corresponding file IO operation request from the bottom layer of the distributed storage system , and cache the data to the aggregated cache layer; if so, read the data from the aggregated cache layer and return the data to the client process, so that the client process can return the data to the client.

可见,本实施例中,客户端进程监测到文件IO操作请求后,将文件IO操作请求重定向到服务端进程,服务端进程先在聚合缓存层中检索文件,在聚合缓存层未命中时,再从分布式存储系统底层检索数据,以提高海量小文件数据集的IO性能,改善高并发元数据密集型文件系统业务中元数据造成的性能瓶颈。It can be seen that in this embodiment, after the client process detects the file IO operation request, the file IO operation request is redirected to the server process, and the server process first retrieves the file in the aggregation cache layer, and when the aggregation cache layer misses, Then retrieve data from the bottom layer of the distributed storage system to improve the IO performance of massive small file datasets and improve the performance bottleneck caused by metadata in the high-concurrency metadata-intensive file system business.

作为一种可选的实施例,处理器执行存储器中保存的计算机子程序时,可以实现以下步骤:利用服务端进程将接收到的文件IO操作请求插入到共享队列中;在共享队列中,确定文件IO操作请求对应的数据是否为已缓存数据;若是,则判定文件IO操作请求对应的目标存储位置为聚合缓存层。As an optional embodiment, when the processor executes the computer subroutine stored in the memory, the following steps can be implemented: utilize the server process to insert the received file IO operation request into the shared queue; in the shared queue, determine Whether the data corresponding to the file IO operation request is cached data; if so, it is determined that the target storage location corresponding to the file IO operation request is the aggregation cache layer.

作为一种可选的实施例,处理器执行存储器中保存的计算机子程序时,可以实现以下步骤:在共享队列中,通过数据线程确定文件IO操作请求对应的数据是否为已缓存数据;数据线程为构建服务端进程实例时生成的线程。As an optional embodiment, when the processor executes the computer subroutine stored in the memory, the following steps can be implemented: in the shared queue, determine whether the data corresponding to the file IO operation request is cached data through the data thread; It is a thread generated when building a server process instance.

作为一种可选的实施例,处理器执行存储器中保存的计算机子程序时,可以实现以下步骤:当监测客户端向分布式存储系统发出的文件IO操作请求,启动服务端进程,利用本计算节点的状态和与本计算节点相邻的其他计算节点的状态动态构建服务端进程实例。As an optional embodiment, when the processor executes the computer subroutine stored in the memory, the following steps can be implemented: when monitoring the file IO operation request sent by the client to the distributed storage system, start the server process, and use this calculation The status of the node and the status of other computing nodes adjacent to the computing node dynamically build a server process instance.

作为一种可选的实施例,处理器执行存储器中保存的计算机子程序时,可以实现以下步骤:通过数据线程将文件IO操作请求重定向至聚合缓存层,以便从聚合缓存层读取数据。As an optional embodiment, when the processor executes the computer subroutine stored in the memory, the following steps may be implemented: redirecting the file IO operation request to the aggregation cache layer through the data thread, so as to read data from the aggregation cache layer.

作为一种可选的实施例,处理器执行存储器中保存的计算机子程序时,可以实现以下步骤:将共享队列配置互斥锁。As an optional embodiment, when the processor executes the computer subroutine stored in the memory, the following steps may be implemented: configuring the shared queue with a mutual exclusion lock.

作为一种可选的实施例,处理器执行存储器中保存的计算机子程序时,可以实现以下步骤:基于文件路径和所属计算节点确定数据在聚合缓存层中的存储位置。As an optional embodiment, when the processor executes the computer subroutine stored in the memory, the following steps may be implemented: determining the storage location of the data in the aggregation cache layer based on the file path and the computing node to which it belongs.

作为一种可选的实施例,处理器执行存储器中保存的计算机子程序时,可以实现以下步骤:将文件IO操作请求广播给与本计算节点相邻的计算节点。As an optional embodiment, when the processor executes the computer subroutine stored in the memory, the following steps may be implemented: broadcasting the file IO operation request to computing nodes adjacent to the computing node.

作为一种可选的实施例,处理器执行存储器中保存的计算机子程序时,可以实现以下步骤:判断文件IO操作请求对应的数据集是否大于本地存储介质的总容量;若是,执行缓存逐出操作和替换操作。As an optional embodiment, when the processor executes the computer subroutine stored in the memory, the following steps can be implemented: determine whether the data set corresponding to the file IO operation request is greater than the total capacity of the local storage medium; if so, perform cache eviction operation and replace operation.

作为一种可选的实施例,处理器执行存储器中保存的计算机子程序时,可以实现以下步骤:基于环境变量构建动态链接库;动态链接库用于拦截文件IO操作请求。As an optional embodiment, when the processor executes the computer subroutine stored in the memory, the following steps may be implemented: building a dynamic link library based on environment variables; the dynamic link library is used to intercept file IO operation requests.

作为一种可选的实施例,处理器执行存储器中保存的计算机子程序时,可以实现以下步骤:通过哈希算法将文件IO操作请求重定向到服务端进程。As an optional embodiment, when the processor executes the computer subroutine stored in the memory, the following steps may be implemented: redirecting the file IO operation request to the server process through a hash algorithm.

作为一种可选的实施例,处理器执行存储器中保存的计算机子程序时,可以实现以下步骤:当满足清除条件,清除本计算节点上中高速存储介质中存储的数据。As an optional embodiment, when the processor executes the computer subroutine stored in the memory, the following steps may be implemented: when the clearing condition is met, clear the data stored in the high-speed storage medium on the computing node.

在上述实施例的基础上,作为优选实施方式,该电子设备还包括:On the basis of the foregoing embodiments, as a preferred implementation manner, the electronic device further includes:

输入接口,与处理器相连,用于获取外部导入的计算机程序、参数和指令,经处理器控制保存至存储器中。该输入接口可以与输入装置相连,接收用户手动输入的参数或指令。该输入装置可以是显示屏上覆盖的触摸层,也可以是终端外壳上设置的按键、轨迹球或触控板。The input interface is connected with the processor, and is used to obtain the computer program, parameters and instructions imported from the outside, and store them in the memory under the control of the processor. The input interface can be connected with an input device to receive parameters or instructions manually input by the user. The input device may be a touch layer covered on the display screen, or a button, a trackball or a touch pad provided on the terminal shell.

显示单元,与处理器相连,用于显示处理器发送的数据。该显示单元可以为液晶显示屏或者电子墨水显示屏等。The display unit is connected with the processor and used for displaying the data sent by the processor. The display unit may be a liquid crystal display or an electronic ink display.

网络端口,与处理器相连,用于与外部各终端设备进行通信连接。该通信连接所采用的通信技术可以为有线通信技术或无线通信技术,如移动高清链接技术(MHL)、通用串行总线(USB)、高清多媒体接口(HDMI)、无线保真技术(WiFi)、蓝牙通信技术、低功耗蓝牙通信技术、基于IEEE802.11s的通信技术等。The network port is connected with the processor and is used for communication connection with various external terminal devices. The communication technology used in the communication connection can be wired communication technology or wireless communication technology, such as mobile high-definition link technology (MHL), universal serial bus (USB), high-definition multimedia interface (HDMI), wireless fidelity technology (WiFi), Bluetooth communication technology, low-power Bluetooth communication technology, communication technology based on IEEE802.11s, etc.

第四方面,本申请还提供了一种分布式存储系统,包括存储底层模块和多个节点,每个节点均包括分层客户端进程、分层服务端进程及存储介质,各个节点的存储介质构成聚合缓存层,其中:In the fourth aspect, the present application also provides a distributed storage system, including a storage bottom module and a plurality of nodes, each node includes a layered client process, a layered server process and a storage medium, and the storage medium of each node Constitutes the aggregation cache layer, where:

分层客户端进程,用于监测客户端发出的文件IO操作请求,当监测到文件IO操作请求,将文件IO操作请求重定向到分层服务端进程;The layered client process is used to monitor the file IO operation request sent by the client, and when the file IO operation request is detected, the file IO operation request is redirected to the layered server process;

分层服务端进程,用于判断文件IO操作请求对应的目标存储位置是否为聚合缓存层,若否,从存储底层模块读取文件IO操作请求对应的数据,并将数据发送到聚合缓存层,若是,从聚合缓存层中读取数据,并将数据返回到分层客户端进程,以便分层客户端进程将数据返回至客户端;The layered server process is used to judge whether the target storage location corresponding to the file IO operation request is the aggregation cache layer, if not, read the data corresponding to the file IO operation request from the storage underlying module, and send the data to the aggregation cache layer, If so, read data from the aggregation cache layer and return the data to the layered client process so that the layered client process returns the data to the client;

聚合缓存层,用于存储分层服务端进程发送的数据。The aggregation cache layer is used to store the data sent by the layered server process.

第五方面,本申请还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上文任意一项的分层缓存方法的步骤。In the fifth aspect, the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the above-mentioned layered caching methods are implemented.

具体的,计算机可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-OnlyMemory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。该存储介质上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:利用客户端进程监测客户端向分布式存储系统发出的文件IO操作请求,当监测到文件IO操作请求,将文件IO操作请求重定向到服务端进程;利用服务端进程判断文件IO操作请求对应的目标存储位置是否为聚合缓存层;若否,从分布式存储系统底层读取文件IO操作请求对应的数据,并将数据缓存到聚合缓存层;若是,从聚合缓存层中读取数据,并将数据返回到客户端进程,以便客户端进程将数据返回至客户端。Specifically, the computer-readable storage medium may include: a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, etc., which can store programs. The medium of the code. A computer program is stored on the storage medium, and the following steps are implemented when the computer program is executed by the processor: use the client process to monitor the file IO operation request sent by the client to the distributed storage system, and when the file IO operation request is detected, the file IO The operation request is redirected to the server process; the server process is used to determine whether the target storage location corresponding to the file IO operation request is the aggregation cache layer; if not, the data corresponding to the file IO operation request is read from the bottom layer of the distributed storage system, and the The data is cached to the aggregation cache layer; if so, the data is read from the aggregation cache layer, and the data is returned to the client process, so that the client process returns the data to the client.

可见,本实施例中,客户端进程监测到文件IO操作请求后,将文件IO操作请求重定向到服务端进程,服务端进程先在聚合缓存层中检索文件,在聚合缓存层未命中时,再从分布式存储系统底层检索数据,以提高海量小文件数据集的IO性能,改善高并发元数据密集型文件系统业务中元数据造成的性能瓶颈。It can be seen that in this embodiment, after the client process detects the file IO operation request, the file IO operation request is redirected to the server process, and the server process first retrieves the file in the aggregation cache layer, and when the aggregation cache layer misses, Then retrieve data from the bottom layer of the distributed storage system to improve the IO performance of massive small file datasets and improve the performance bottleneck caused by metadata in the high-concurrency metadata-intensive file system business.

作为一种可选的实施例,计算机可读存储介质中存储的计算机子程序被处理器执行时,具体可以实现以下步骤:利用服务端进程将接收到的文件IO操作请求插入到共享队列中;在共享队列中,确定文件IO操作请求对应的数据是否为已缓存数据;若是,则判定文件IO操作请求对应的目标存储位置为聚合缓存层。As an optional embodiment, when the computer subroutine stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: using the server process to insert the received file IO operation request into the shared queue; In the shared queue, it is determined whether the data corresponding to the file IO operation request is cached data; if so, it is determined that the target storage location corresponding to the file IO operation request is the aggregation cache layer.

作为一种可选的实施例,计算机可读存储介质中存储的计算机子程序被处理器执行时,具体可以实现以下步骤:在共享队列中,通过数据线程确定文件IO操作请求对应的数据是否为已缓存数据;数据线程为构建服务端进程实例时生成的线程。As an optional embodiment, when the computer subroutine stored in the computer-readable storage medium is executed by the processor, the following steps can be specifically implemented: in the shared queue, determine whether the data corresponding to the file IO operation request is The data has been cached; the data thread is the thread generated when the server process instance is constructed.

作为一种可选的实施例,计算机可读存储介质中存储的计算机子程序被处理器执行时,具体可以实现以下步骤:当监测客户端向分布式存储系统发出的文件IO操作请求,启动服务端进程,利用本计算节点的状态和与本计算节点相邻的其他计算节点的状态动态构建服务端进程实例。As an optional embodiment, when the computer subroutine stored in the computer-readable storage medium is executed by the processor, the following steps can be specifically implemented: when the monitoring client sends a file IO operation request to the distributed storage system, start the service The end process uses the status of the computing node and the status of other computing nodes adjacent to the computing node to dynamically build a server process instance.

作为一种可选的实施例,计算机可读存储介质中存储的计算机子程序被处理器执行时,具体可以实现以下步骤:通过数据线程将文件IO操作请求重定向至聚合缓存层,以便从聚合缓存层读取数据。As an optional embodiment, when the computer subroutine stored in the computer-readable storage medium is executed by the processor, the following steps can be specifically implemented: redirect the file IO operation request to the aggregation cache layer through the data thread, so that the The cache layer reads data.

作为一种可选的实施例,计算机可读存储介质中存储的计算机子程序被处理器执行时,具体可以实现以下步骤:将共享队列配置互斥锁。As an optional embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: configuring the shared queue with a mutual exclusion lock.

作为一种可选的实施例,计算机可读存储介质中存储的计算机子程序被处理器执行时,具体可以实现以下步骤:基于文件路径和所属计算节点确定数据在聚合缓存层中的存储位置。As an optional embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: determining the storage location of the data in the aggregation cache layer based on the file path and the computing node to which it belongs.

作为一种可选的实施例,计算机可读存储介质中存储的计算机子程序被处理器执行时,具体可以实现以下步骤:将文件IO操作请求广播给与本计算节点相邻的计算节点。As an optional embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: broadcasting the file IO operation request to computing nodes adjacent to the computing node.

作为一种可选的实施例,计算机可读存储介质中存储的计算机子程序被处理器执行时,具体可以实现以下步骤:判断文件IO操作请求对应的数据集是否大于本地存储介质的总容量;若是,执行缓存逐出操作和替换操作。As an optional embodiment, when the computer subroutine stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: judging whether the data set corresponding to the file IO operation request is greater than the total capacity of the local storage medium; If so, perform cache eviction and replacement operations.

作为一种可选的实施例,计算机可读存储介质中存储的计算机子程序被处理器执行时,具体可以实现以下步骤:基于环境变量构建动态链接库;动态链接库用于拦截文件IO操作请求。As an optional embodiment, when the computer subroutine stored in the computer-readable storage medium is executed by the processor, the following steps can be specifically implemented: constructing a dynamic link library based on environment variables; the dynamic link library is used to intercept file IO operation requests .

作为一种可选的实施例,计算机可读存储介质中存储的计算机子程序被处理器执行时,具体可以实现以下步骤:通过哈希算法将文件IO操作请求重定向到服务端进程。As an optional embodiment, when the computer subprogram stored in the computer-readable storage medium is executed by the processor, the following steps may be specifically implemented: redirecting the file IO operation request to the server process through a hash algorithm.

作为一种可选的实施例,计算机可读存储介质中存储的计算机子程序被处理器执行时,具体可以实现以下步骤:当满足清除条件,清除本计算节点上中高速存储介质中存储的数据。As an optional embodiment, when the computer subroutine stored in the computer-readable storage medium is executed by the processor, the following steps can be specifically implemented: When the clearing condition is met, clear the data stored in the medium and high-speed storage medium on the computing node .

还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的状况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that in this specification, relative terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations There is no such actual relationship or order between the operations. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其他实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the application. Therefore, the present application will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (20)

1.一种分层缓存方法,其特征在于,应用于分布式存储系统的每一计算节点,所述分布式存储系统包括客户端层,所述客户端层部署客户端进程和服务端进程,所述分层缓存方法包括:1. A layered cache method, characterized in that it is applied to each computing node of a distributed storage system, the distributed storage system includes a client layer, and the client layer deploys a client process and a server process, The hierarchical caching method includes: 利用所述客户端进程监测客户端向所述分布式存储系统发出的文件IO操作请求,当监测到所述文件IO操作请求,将所述文件IO操作请求重定向到所述服务端进程;Using the client process to monitor the file IO operation request sent by the client to the distributed storage system, and redirecting the file IO operation request to the server process when the file IO operation request is detected; 利用所述服务端进程判断所述文件IO操作请求对应的目标存储位置是否为聚合缓存层;Using the server process to determine whether the target storage location corresponding to the file IO operation request is an aggregation cache layer; 若否,从所述分布式存储系统底层读取所述文件IO操作请求对应的数据,并将所述数据缓存到所述聚合缓存层;If not, read the data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and cache the data to the aggregation cache layer; 若是,从所述聚合缓存层中读取所述数据,并将所述数据返回到所述客户端进程,以便所述客户端进程将所述数据返回至所述客户端。If so, read the data from the aggregation cache layer, and return the data to the client process, so that the client process returns the data to the client. 2.根据权利要求1所述的分层缓存方法,其特征在于,利用所述服务端进程判断所述文件IO操作请求对应的目标存储位置是否为聚合缓存层的过程包括:2. The layered cache method according to claim 1, wherein the process of using the server process to judge whether the target storage location corresponding to the file IO operation request is an aggregation cache layer comprises: 利用所述服务端进程将接收到的所述文件IO操作请求插入到共享队列中;Utilize the server process to insert the received file IO operation request into a shared queue; 在所述共享队列中,确定所述文件IO操作请求对应的数据是否为已缓存数据;In the shared queue, determine whether the data corresponding to the file IO operation request is cached data; 若是,则判定所述文件IO操作请求对应的目标存储位置为聚合缓存层。If yes, it is determined that the target storage location corresponding to the file IO operation request is the aggregation cache layer. 3.根据权利要求2所述的分层缓存方法,其特征在于,在所述共享队列中,确定所述文件IO操作请求对应的数据是否为已缓存数据的过程包括:3. The layered cache method according to claim 2, wherein, in the shared queue, the process of determining whether the data corresponding to the file IO operation request is cached data comprises: 在所述共享队列中,通过数据线程确定所述文件IO操作请求对应的数据是否为已缓存数据;所述数据线程为构建服务端进程实例时生成的线程。In the shared queue, a data thread is used to determine whether the data corresponding to the file IO operation request is cached data; the data thread is a thread generated when the server process instance is constructed. 4.根据权利要求3所述的分层缓存方法,其特征在于,所述分层缓存方法还包括:4. The layered caching method according to claim 3, wherein the layered caching method further comprises: 当监测到所述客户端向所述分布式存储系统发出的文件IO操作请求,启动所述服务端进程,利用本计算节点的状态和与本计算节点相邻的其他计算节点的状态动态构建所述服务端进程实例。When the file IO operation request sent by the client to the distributed storage system is detected, the server process is started, and the state of the computing node and other computing nodes adjacent to the computing node are used to dynamically construct the Instance of the server process described above. 5.根据权利要求3所述的分层缓存方法,其特征在于,从所述聚合缓存层中读取所述数据的过程包括:5. The layered cache method according to claim 3, wherein the process of reading the data from the aggregation cache layer comprises: 通过数据线程将所述文件IO操作请求重定向至所述聚合缓存层,以便从所述聚合缓存层读取所述数据。The file IO operation request is redirected to the aggregation cache layer through a data thread, so as to read the data from the aggregation cache layer. 6.根据权利要求2所述的分层缓存方法,其特征在于,利用所述服务端进程将接收到的所述文件IO操作请求插入到共享队列中的同时,该分层缓存方法还包括:6. The layered caching method according to claim 2, wherein, while utilizing the server process to insert the received file IO operation request into the shared queue, the layered caching method also includes: 将所述共享队列配置互斥锁。Configure the shared queue with a mutex. 7.根据权利要求2所述的分层缓存方法,其特征在于,所述共享队列为FIFO队列。7. The hierarchical caching method according to claim 2, wherein the shared queue is a FIFO queue. 8.根据权利要求1所述的分层缓存方法,其特征在于,所述文件IO操作请求对应的数据包括文件描述符、读取偏移量和长度。8. The hierarchical caching method according to claim 1, wherein the data corresponding to the file IO operation request includes a file descriptor, read offset and length. 9.根据权利要求1所述的分层缓存方法,其特征在于,所述分层缓存方法还包括:9. The layered caching method according to claim 1, wherein the layered caching method further comprises: 基于文件路径和所属计算节点确定所述数据在所述聚合缓存层中的存储位置。The storage location of the data in the aggregation cache layer is determined based on the file path and the computing node to which it belongs. 10.根据权利要求1所述的分层缓存方法,其特征在于,所述分层缓存方法还包括:10. The layered cache method according to claim 1, wherein the layered cache method further comprises: 将所述文件IO操作请求广播给与本计算节点相邻的计算节点。Broadcast the file IO operation request to computing nodes adjacent to the current computing node. 11.根据权利要求1所述的分层缓存方法,其特征在于,所述分层缓存方法还包括:11. The layered caching method according to claim 1, wherein the layered caching method further comprises: 判断所述文件IO操作请求对应的数据集是否大于本地存储介质的总容量;Judging whether the data set corresponding to the file IO operation request is greater than the total capacity of the local storage medium; 若是,执行缓存逐出操作和替换操作。If so, perform cache eviction and replacement operations. 12.根据权利要求1所述的分层缓存方法,其特征在于,所述分层缓存方法还包括:12. The layered caching method according to claim 1, wherein the layered caching method further comprises: 基于环境变量构建动态链接库;所述动态链接库用于拦截所述文件IO操作请求。A dynamic link library is built based on environment variables; the dynamic link library is used to intercept the file IO operation request. 13.根据权利要求1所述的分层缓存方法,其特征在于,将所述文件IO操作请求重定向到服务端进程的过程包括:13. The layered caching method according to claim 1, wherein the process of redirecting the file IO operation request to the server process comprises: 通过哈希算法将所述文件IO操作请求重定向到服务端进程。The file IO operation request is redirected to the server process through a hash algorithm. 14.根据权利要求1-13任意一项所述的分层缓存方法,其特征在于,所述聚合缓存层为由所述分布式存储系统中各个所述计算节点中的高速存储介质构成的缓存层。14. The hierarchical caching method according to any one of claims 1-13, wherein the aggregated caching layer is a cache composed of high-speed storage media in each of the computing nodes in the distributed storage system layer. 15.根据权利要求14所述的分层缓存方法,其特征在于,所述高速存储介质为NvmeSSD。15. The hierarchical caching method according to claim 14, wherein the high-speed storage medium is NvmeSSD. 16.根据权利要求14所述的分层缓存方法,其特征在于,所述分层缓存方法还包括:16. The layered caching method according to claim 14, wherein the layered caching method further comprises: 当满足清除条件,清除本计算节点上中高速存储介质中存储的数据。When the clearing condition is met, clear the data stored in the high-speed storage medium on the computing node. 17.一种分层缓存系统,其特征在于,应用于分布式存储系统的每一计算节点,所述分布式存储系统包括客户端层,所述客户端层部署客户端进程和服务端进程,所述分层缓存系统包括:17. A layered cache system, characterized in that it is applied to each computing node of a distributed storage system, the distributed storage system includes a client layer, and the client layer deploys a client process and a server process, The hierarchical caching system includes: 监测模块,用于利用所述客户端进程监测客户端向所述分布式存储系统发出的文件IO操作请求,当监测到所述文件IO操作请求,将所述文件IO操作请求重定向到所述服务端进程;The monitoring module is configured to use the client process to monitor the file IO operation request sent by the client to the distributed storage system, and when the file IO operation request is detected, redirect the file IO operation request to the server process; 处理模块,用于利用所述服务端进程判断所述文件IO操作请求对应的目标存储位置是否为聚合缓存层,若否,触发第一读取模块,若是,触发第二读取模块;A processing module, configured to use the server process to judge whether the target storage location corresponding to the file IO operation request is an aggregation cache layer, if not, trigger the first reading module, and if so, trigger the second reading module; 第一读取模块,用于从所述分布式存储系统底层读取所述文件IO操作请求对应的数据,并将所述数据缓存到所述聚合缓存层;The first reading module is configured to read the data corresponding to the file IO operation request from the bottom layer of the distributed storage system, and cache the data to the aggregation cache layer; 第二读取模块,用于从所述聚合缓存层中读取所述数据,并将所述数据返回到所述客户端进程,以便所述客户端进程将所述数据返回至所述客户端。A second reading module, configured to read the data from the aggregation cache layer, and return the data to the client process, so that the client process returns the data to the client . 18.一种电子设备,其特征在于,包括:18. An electronic device, characterized in that it comprises: 存储器,用于存储计算机程序;memory for storing computer programs; 处理器,用于执行所述计算机程序时实现如权利要求1-16任意一项所述的分层缓存方法的步骤。A processor, configured to implement the steps of the hierarchical caching method according to any one of claims 1-16 when executing the computer program. 19.一种分布式存储系统,其特征在于,包括存储底层模块和多个节点,每个所述节点均包括分层客户端进程、分层服务端进程及存储介质,各个所述节点的存储介质构成聚合缓存层,其中:19. A distributed storage system, characterized in that it includes a storage bottom module and a plurality of nodes, each of which includes a layered client process, a layered server process, and a storage medium, and the storage of each of the nodes The media constitutes the aggregate caching layer, where: 所述分层客户端进程,用于监测客户端发出的文件IO操作请求,当监测到所述文件IO操作请求,将所述文件IO操作请求重定向到所述分层服务端进程;The layered client process is used to monitor the file IO operation request sent by the client, and when the file IO operation request is detected, redirect the file IO operation request to the layered server process; 所述分层服务端进程,用于判断所述文件IO操作请求对应的目标存储位置是否为所述聚合缓存层,若否,从所述存储底层模块读取所述文件IO操作请求对应的数据,并将所述数据发送到所述聚合缓存层,若是,从所述聚合缓存层中读取所述数据,并将所述数据返回到所述分层客户端进程,以便所述分层客户端进程将所述数据返回至所述客户端;The layered server process is used to judge whether the target storage location corresponding to the file IO operation request is the aggregation cache layer, and if not, read the data corresponding to the file IO operation request from the storage bottom layer module , and send the data to the aggregation cache layer, if so, read the data from the aggregation cache layer, and return the data to the layered client process so that the layered client The end process returns the data to the client; 所述聚合缓存层,用于存储所述分层服务端进程发送的数据。The aggregation cache layer is used to store the data sent by the layered server process. 20.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-16任意一项所述的分层缓存方法的步骤。20. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the analysis according to any one of claims 1-16 is realized. Steps of the layer caching method.
CN202310220769.3A 2023-03-09 2023-03-09 A layered cache method, system and related components Active CN116048425B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202310220769.3A CN116048425B (en) 2023-03-09 2023-03-09 A layered cache method, system and related components
PCT/CN2024/080583 WO2024183799A1 (en) 2023-03-09 2024-03-07 Hierarchical caching method and system, and related component

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310220769.3A CN116048425B (en) 2023-03-09 2023-03-09 A layered cache method, system and related components

Publications (2)

Publication Number Publication Date
CN116048425A CN116048425A (en) 2023-05-02
CN116048425B true CN116048425B (en) 2023-07-14

Family

ID=86127618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310220769.3A Active CN116048425B (en) 2023-03-09 2023-03-09 A layered cache method, system and related components

Country Status (2)

Country Link
CN (1) CN116048425B (en)
WO (1) WO2024183799A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116048425B (en) * 2023-03-09 2023-07-14 浪潮电子信息产业股份有限公司 A layered cache method, system and related components
CN119556864B (en) * 2024-12-27 2025-05-23 苏州元脑智能科技有限公司 Data management method, device, medium and program product
CN119782008B (en) * 2025-02-28 2025-05-30 中国科学技术大学 Multi-stage caching method for accelerating AI data processing
CN120578608B (en) * 2025-07-31 2025-09-26 苏州元脑智能科技有限公司 Key value cache management system, method, equipment and medium in model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158965A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 A file reading system and method of a distributed file system
CN113688113A (en) * 2021-07-28 2021-11-23 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Metadata prefetching system and method for distributed file system

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7165096B2 (en) * 2000-12-22 2007-01-16 Data Plow, Inc. Storage area network file system
US8495250B2 (en) * 2009-12-16 2013-07-23 International Business Machines Corporation Asynchronous file operations in a scalable multi-node file system cache for a remote cluster file system
CN103744975A (en) * 2014-01-13 2014-04-23 锐达互动科技股份有限公司 Efficient caching server based on distributed files
US9323615B2 (en) * 2014-01-31 2016-04-26 Google Inc. Efficient data reads from distributed storage systems
CN104317736B (en) * 2014-09-28 2017-09-01 曙光信息产业股份有限公司 A kind of distributed file system multi-level buffer implementation method
US10664405B2 (en) * 2017-11-03 2020-05-26 Google Llc In-memory distributed cache
CN111984191A (en) * 2020-08-05 2020-11-24 华东计算技术研究所(中国电子科技集团公司第三十二研究所) A multi-client caching method and system supporting distributed storage
CN112000287B (en) * 2020-08-14 2022-06-17 北京浪潮数据技术有限公司 IO request processing device, method, equipment and readable storage medium
CN113835614A (en) * 2020-09-17 2021-12-24 北京焱融科技有限公司 A kind of SSD intelligent caching method and system based on distributed file storage client
CN112363676A (en) * 2020-11-18 2021-02-12 无锡江南计算技术研究所 Control method and system based on low access delay distributed storage system
CN116048425B (en) * 2023-03-09 2023-07-14 浪潮电子信息产业股份有限公司 A layered cache method, system and related components

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158965A (en) * 2007-10-25 2008-04-09 中国科学院计算技术研究所 A file reading system and method of a distributed file system
CN113688113A (en) * 2021-07-28 2021-11-23 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Metadata prefetching system and method for distributed file system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A. A. Shvidkiy ; A. A. Savelieva ; A. A. Zarubin.Caching Methods Analysis for Improving Distributed Storage Systems Performance.《2021 Systems of Signal Synchronization, Generating and Processing in Telecommunications》.2021,摘要. *

Also Published As

Publication number Publication date
WO2024183799A1 (en) 2024-09-12
CN116048425A (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN116048425B (en) A layered cache method, system and related components
US12235874B2 (en) Cross-organization and cross-cloud automated data pipelines
CA3027756C (en) Systems and methods for efficient distribution of stored data objects
US11562091B2 (en) Low latency access to physical storage locations by implementing multiple levels of metadata
US11727004B2 (en) Context dependent execution time prediction for redirecting queries
US20200050694A1 (en) Burst Performance of Database Queries According to Query Size
US20220327132A1 (en) Real-time streaming data ingestion into database tables
US11055262B1 (en) Extensible streams on data sources
US11080207B2 (en) Caching framework for big-data engines in the cloud
CN114265814B (en) Data lake file system based on object storage
US12197437B2 (en) Selecting between hydration-based scanning and stateless scale-out scanning to improve query performance
US12430357B2 (en) Replication of unstructured staged data between database deployments
US11016676B2 (en) Spot coalescing of distributed data concurrent with storage I/O operations
CN115136133A (en) Single use execution environment for on-demand code execution
WO2017126003A1 (en) Computer system including plurality of types of memory devices, and method therefor
CN116450966A (en) Cache access method and device, equipment and storage medium
Zhong et al. Dpc: Dpu-accelerated high-performance file system client
US20210397581A1 (en) Sparse file system implemented with multiple cloud services
CN119377258A (en) Data collection and retrieval distributed system, method and computer device
Branagan et al. Understanding the top 5 Redis performance metrics
KR102771046B1 (en) Apparatus for preloading data in distributed computing enviroment and method using the same
US9317546B2 (en) Storing changes made toward a limit
CN120215834A (en) Power user activity data processing method and device based on storage integrated system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant