HK1231197B

HK1231197B - Workload-aware i/o scheduler in software-defined hybrid storage system

Info

Publication number: HK1231197B
Application number: HK17104474.8A
Authority: HK
Inventors: 陈文贤; 黄明仁
Original assignee: 先智云端数据股份有限公司
Filing date: 2017-05-04
Publication date: 2019-10-18

Description

Workload-aware input/output scheduler in software-defined hybrid storage systems

技术领域Technical Field

本发明涉及一种输入输出调度器，特别是涉及一种软件定义混合储存系统中工作负载感知的输入输出调度器。The present invention relates to an input/output scheduler, and in particular to a workload-aware input/output scheduler in a software-defined hybrid storage system.

背景技术Background Art

计算机操作系统使用输入输出调度器来决定储存块输入/输出作业将以何种顺序提交给储存卷。依据不同的目的，一个输入输出调度器的目标可能以最小化花费在硬盘搜寻的时间、优先考虑处理一部分的输入/输出请求、给每个正在运行的程序部分磁盘带宽的份额，及/或保证某些输入/输出请求在一个特定的截止时点前开始执行。举例而言，Linux内核内的截止时点调度器用来保证对于请求的一个开始服务时间，它的功能由强加一个截止时点到所有的输入/输出作业中来防止请求饥饿现象(starvation)而达成。因而，通过使用各别输入/输出队列，截止时点调度器偏好读取过于写入。它对数据库工作负载运行良好。另一个例子是完全公平队列(Complete Fair Queuing)调度器。完全公平队列调度器放置由程序提交的同步请求到许多按程序的队列中，并接着对每一队列分配时间片，以存取该磁盘。从而，完全公平队列调度器适合于连续读取视频或音频流，及来自一般主机的工作负载。Computer operating systems use an I/O scheduler to determine the order in which block I/O jobs are submitted to storage volumes. Depending on the purpose, an I/O scheduler might aim to minimize the time spent on disk seeks, prioritize processing of certain I/O requests, allocate a portion of disk bandwidth to each running process, and/or ensure that certain I/O requests begin executing before a specific deadline. For example, the deadline scheduler within the Linux kernel guarantees a service start time for requests by imposing a deadline on all I/O jobs to prevent request starvation. Thus, by using separate I/O queues, the deadline scheduler favors reads over writes. It works well for database workloads. Another example is the Complete Fair Queuing scheduler. The Complete Fair Queuing scheduler places simultaneous requests submitted by a process into a number of per-process queues and then allocates time slices to each queue for disk access. Consequently, the Complete Fair Queuing scheduler is well-suited for continuous reads of video or audio streams and general host workloads.

上述的调度器是基于改善硬盘或由硬盘组成的储存系统的性能而提出的。以下是一些硬盘储存系统的特性：首先，多重输入/输出请求(读取与写入两者)可能合并到一个单一请求中，而该单一请求在一个读取写入头的移动下进行处理。因此，读取写入头移动的次数可减少，以增加了硬盘的流通量。第二，输入/输出请求被分类，由减少硬盘读取写入头的往来移动以改善搜寻时间。基于该些特征，排成队列以等待将来处理的输入/输出请求可被合并与分类。当使用不同的调度器时，具有不同特征的工作负载能以较佳的性能进行处理。The above-mentioned scheduler is proposed based on improving the performance of a hard disk or a storage system composed of hard disks. The following are some characteristics of a hard disk storage system: First, multiple input/output requests (both read and write) may be merged into a single request, and the single request is processed under the movement of a read/write head. Therefore, the number of times the read/write head moves can be reduced, thereby increasing the throughput of the hard disk. Second, the input/output requests are classified to improve the seek time by reducing the back and forth movement of the hard disk read/write head. Based on these characteristics, the input/output requests queued for future processing can be merged and classified. When different schedulers are used, workloads with different characteristics can be processed with better performance.

储存系统的一种主流，特别是用在云端系统，是使用混合的固态硬盘与硬盘，而非仅由硬盘组成，因而最新的调度器可能无法在它们装设到储存系统时，达到它们本身要求所要达成的目标。固态硬盘具有异于硬盘的鲜明特点描述如下。首先，固态硬盘不需要合并及分类固态硬盘的输入/输出请求，这暗示了无须合并与分类所需的时间。输入/输出请求应能以最快速度传送到固态硬盘中。第二，因为许多现代固态硬盘具有多信道，固态硬盘输入/输出请求可能平行化处理，而该多通道能同时容纳多个输入/输出请求。如果应用该调度器的储存系统是一种软件定义混合储存系统，要处理调度器需求的情况会更加复杂。在考虑固态硬盘存在的情况下，有必要改变及改进现有的调度器A mainstream storage system, especially for cloud systems, uses a mix of SSDs and HDDs, rather than just HDDs, so the latest schedulers may not be able to achieve the goals they are required to achieve when they are installed in the storage system. SSDs have distinct characteristics that are different from HDDs, which are described below. First, SSDs do not need to merge and classify SSD input/output requests, which implies that there is no time required for merging and classification. Input/output requests should be transmitted to the SSD at the fastest speed. Second, because many modern SSDs have multiple channels, SSD input/output requests may be processed in parallel, and the multiple channels can accommodate multiple input/output requests at the same time. If the storage system to which the scheduler is applied is a software-defined hybrid storage system, it will be more complicated to handle the scheduler requirements. Taking into account the presence of SSDs, it is necessary to change and improve the existing scheduler.

此外，进一步调查，工作负载的流量特性是另一项重要的问题。任何工作负载可能具有某些异于其它工作负载的特性，该特性可能是输入/输出模式(顺序或随机)、读取/写入比率、固态硬盘快取命中等。举例而言，在线事务处理(On-Line TransactionProcessing)数据库的工作负载具有随机输入/输出模式、读取/写入比率大于1，及较小的储存块大小；而MongoDB的工作负载具有顺序输入/输出模式、读取/写入比率小于1，及较大的储存块大小。如果该两种工作负载在相同混合储存系统下执行，近来发展的调度器不能达到服务层级协议(Service Level Agreements)中对二者的性能需求，至少会存在嘈杂邻居问题(Noisy Neighbor Problem)，影响该些工作负载。Furthermore, upon further investigation, workload traffic characteristics are another important issue. Any workload may have certain characteristics that differ from other workloads. These characteristics may include input/output patterns (sequential or random), read/write ratios, and SSD cache hits. For example, an online transaction processing (OLTP) database workload has a random input/output pattern, a read/write ratio greater than 1, and a small storage block size; while a MongoDB workload has a sequential input/output pattern, a read/write ratio less than 1, and a large storage block size. If these two workloads are executed on the same hybrid storage system, recently developed schedulers cannot meet the performance requirements of both in the Service Level Agreements (SLAs), or at least a noisy neighbor problem may exist, affecting these workloads.

对上述要求，现有的一些技术提供了相关的解决方案，其中一例公开于美国专利第8,756,369号。在该专利中，一储存系统包括一命令分类器，用来决定用于至少一固态硬盘命令与一硬盘命令的一目标储存设备、如果该固态硬盘命令是标定到该储存系统的一固态硬盘储存设备，则放置该命令到一固态硬盘预备队列中，及如果该硬盘命令是标定到该储存系统的一硬盘储存设备，则放置该硬盘命令到一硬盘预备队列中。该储存系统也包括一固态硬盘预备队列，用以队列标定到该固态硬盘储存设备的固态硬盘命令，及一硬盘预备队列，用以队列标定到该硬盘储存设备的硬盘命令。同时，一命令调度器从该预备队列中取出硬盘与固态硬盘命令，并放置该些命令到一命令处理器中。基于对应该特别命令的一目标设备的一处理队列的可用性级别，该命令调度器放置来自其各自预备队列的一特别(硬盘或固态硬盘)命令，到该命令处理器中。那么，该命令处理器给予该处理队列该些储存命令。Several existing technologies offer solutions to these requirements, one example of which is disclosed in U.S. Patent No. 8,756,369. In this patent, a storage system includes a command classifier for determining a target storage device for at least one SSD command and one HDD command, placing the command in an SSD standby queue if the SSD command is destined for an SSD storage device of the storage system, and placing the HDD command in a HDD standby queue if the HDD command is destined for a HDD storage device of the storage system. The storage system also includes an SSD standby queue for queuing SSD commands destined for the SSD storage device and a HDD standby queue for queuing HDD commands destined for the HDD storage device. Simultaneously, a command scheduler retrieves HDD and SSD commands from the standby queues and places them in a command processor. Based on the availability level of a processing queue of a target device corresponding to the particular command, the command scheduler places a particular (HDD or SSD) command from its respective standby queue into the command processor. Then, the command processor gives the storage commands to the processing queue.

美国专利第8,756,369号所提供的储存系统区分硬盘命令(输入/输出请求)与固态硬盘命令，它有助于对一工作负载的硬件操作。然而，如果应用于多工作负载，该储存系统可能无法如它预设般地运作。另一方面，对于各种运行于该储存系统上的工作负载来说，没有合适的方式来协调来自每一工作负载的请求，以便每一工作负载能满足服务层级协议或服务质量(Quality of Service)需求。因此，在软件定义混合储存系统中需要一种能够感知工作负载的输入输出调度器，该调度器可用来解决以上的问题。The storage system provided by U.S. Patent No. 8,756,369 distinguishes between hard disk commands (input/output requests) and solid-state drive commands, which facilitates hardware operations for a workload. However, if applied to multiple workloads, the storage system may not operate as it is designed to. On the other hand, for various workloads running on the storage system, there is no suitable way to coordinate requests from each workload so that each workload can meet service level agreements or quality of service requirements. Therefore, a workload-aware input/output scheduler is needed in the software-defined hybrid storage system, which can be used to solve the above problems.

发明内容Summary of the Invention

为了满足上述需求，本发明提供了一种软件定义混合储存(SDHS)系统中工作负载感知的输入输出调度器，该软件定义混合储存至少包括一硬盘与一固态硬盘。该软件定义混合储存系统中工作负载感知的输入输出调度器包括一队列管理模块、一工作负载特性数据库及一流量监控模块，该队列管理模块用以管理队列、读取请求与写入请求，该队列管理模块包含一请求接收子模块、一请求控制子模块及一请求调度子模块，其中，该请求接收子模块用以暂时储存该些读取请求与写入请求；该请求控制子模块用以创建工作负载队列、依照一调度器配置功能以动态配置该些工作负载队列与安排该些读取请求与写入请求到该些工作负载队列中；该请求调度子模块用以创建设备队列，并调度来自该工作负载队列的每一读取请求或写入请求到一特定的设备队列中；该工作负载特性数据库用以储存工作负载的特性以供存取；该流量监控模块用以监控及持续记录该软件定义混合储存系统的一性能参数的值，并提供该些性能参数的值给该请求控制子模块。In order to meet the above requirements, the present invention provides a workload-aware input and output scheduler in a software-defined hybrid storage (SDHS) system, wherein the software-defined hybrid storage includes at least a hard disk and a solid-state drive. The workload-aware input/output scheduler in the software-defined hybrid storage system includes a queue management module, a workload characteristic database and a traffic monitoring module. The queue management module is used to manage queues, read requests and write requests. The queue management module includes a request receiving submodule, a request control submodule and a request scheduling submodule, wherein the request receiving submodule is used to temporarily store the read requests and write requests; the request control submodule is used to create workload queues, dynamically configure the workload queues according to a scheduler configuration function and arrange the read requests and write requests to the workload queues; the request scheduling submodule is used to create device queues and schedule each read request or write request from the workload queue to a specific device queue; the workload characteristic database is used to store the characteristics of the workload for access; the traffic monitoring module is used to monitor and continuously record the value of a performance parameter of the software-defined hybrid storage system, and provide the values of the performance parameters to the request control submodule.

该调度器配置功能对每一工作负载队列，基于该工作负载特性数据库提供的该些特性与接收到的该些性能参数的值，计算一队列深度与一等待时间，以调整该软件定义混合储存系统的性能参数值在未来落于设定给该性能参数的一性能保证值与一性能调节值之间。The scheduler configuration function calculates a queue depth and a waiting time for each workload queue based on the characteristics provided by the workload characteristic database and the values of the received performance parameters, so as to adjust the performance parameter value of the software-defined hybrid storage system to fall between a performance guarantee value and a performance adjustment value set for the performance parameter in the future.

上述软件定义混合储存系统中工作负载感知的输入输出调度器，还可以进一步包含一流量模型模块，该流量模型模块用以模型化来自该工作负载的该些请求的储存流量，并在未来的一特定的时间点上提供该些特性的预测储存流量。The workload-aware input/output scheduler in the above-mentioned software-defined hybrid storage system may further include a traffic model module, which is used to model the storage traffic of those requests from the workload and provide predicted storage traffic of those characteristics at a specific time point in the future.

较佳的，该些特性为读取/写入比率、合并比率、固态硬盘命中率及储存块大小。该性能参数为每秒输入输出操作次数、流通量、延迟时间或前述三者的组合。该性能保证值与性能调节值由该工作负载的服务层级协议(Service Level Agreement)或服务质量(Quality of Service)需求所定义。每一工作负载队列被归类为深层次、中等层次或浅层次，每一等待时间被归类为长延时、中等延时或短延时，其中深层次工作负载队列的队列深度较中等层次工作负载队列的队列深度容纳更多读取请求或写入请求；中等层次工作负载队列的队列深度较浅层次工作负载队列的队列深度容纳更多读取请求或写入请求；储存块大小为一中等大小；长延时长于中等延时；中等延时长于短延时。Preferably, the characteristics are read/write ratio, merge ratio, SSD hit rate, and storage block size. The performance parameter is input/output operations per second, throughput, latency, or a combination of the three. The performance guarantee value and performance adjustment value are defined by the service level agreement (SLA) or quality of service (QoS) requirements of the workload. Each workload queue is categorized as deep, medium, or shallow, and each waiting time is categorized as long latency, medium latency, or short latency, wherein the queue depth of a deep workload queue accommodates more read requests or write requests than the queue depth of a medium workload queue; the queue depth of a medium workload queue accommodates more read requests or write requests than the queue depth of a shallow workload queue; the storage block size is a medium size; the long latency is longer than the medium latency; and the medium latency is longer than the short latency.

在一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、该接收的延迟时间值接近或低于性能保证值、该读取/写入比率大于或等于1，且储存块大小大于或等于该中等大小，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为短延时。In one example, if the received input/output operations per second or throughput value is close to the respective performance adjustment value, the received latency value is close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is greater than or equal to the medium size, then the queue depth of the workload queue is set to the medium level, and the waiting time for each workload queue is set to short latency.

在另一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、该接收的延迟时间值接近或低于能保证值、该读取/写入比率大于或等于1，且该储存块大小小于该中等大小，则该工作负载队列的队列深度设定为深层次，且对每一工作负载队列的等待时间定为中等延时。In another example, if the received input/output operations per second or throughput value is close to the respective performance adjustment value, the received latency value is close to or lower than the guaranteed value, the read/write ratio is greater than or equal to 1, and the storage block size is less than the medium size, then the queue depth of the workload queue is set to deep level and the waiting time for each workload queue is set to medium delay.

在另一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、该接收的延迟时间值接近或低于性能保证值，且该读取/写入比率小于1，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为短延时。In another example, if the received input/output operations per second or throughput value is close to the respective performance adjustment value, the received latency value is close to or lower than the performance guarantee value, and the read/write ratio is less than 1, the queue depth of the workload queue is set to a medium level, and the waiting time for each workload queue is set to a short delay.

在另一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、该接收的延迟时间值接近或低于性能保证值、该读取/写入比率大于或等于1，且该储存块大小大于或等于该中等大小，则该工作负载队列的队列深度设定为浅层次，且对每一工作负载队列的等待时间设定为短延时。In another example, if the received input/output operations per second or throughput value is close to or lower than the respective performance guarantee value, the received latency value is close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is greater than or equal to the medium size, then the queue depth of the workload queue is set to shallow, and the waiting time for each workload queue is set to short delay.

在另一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、该接收的延迟时间值接近或低于性能保证值、该读取/写入比率大于或等于1，且该储存块大小小于该中等大小，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为中等延时。In another example, if the received input/output operations per second or throughput value is close to or lower than the respective performance guarantee value, the received latency value is close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is less than the medium size, then the queue depth of the workload queue is set to a medium level, and the waiting time for each workload queue is set to a medium delay.

在另一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、该接收的延迟时间值接近或低于性能保证值，且该读取/写入比率小于1，则该工作负载队列的队列深度设定为浅层次，且对每一工作负载队列的等待时间设定为短延时。In another example, if the received input/output operations per second or throughput value is close to or lower than the respective performance guarantee values, the received latency value is close to or lower than the performance guarantee value, and the read/write ratio is less than 1, the queue depth of the workload queue is set to shallow, and the waiting time for each workload queue is set to short delay.

在另一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、该接收的延迟时间值不接近或低于性能保证值、该读取/写入比率大于或等于1，且该储存块大小大于或等于该中等大小，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为中等延时。In another example, if the received input/output operations per second or throughput value is close to the respective performance adjustment value, the received latency value is not close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is greater than or equal to the medium size, then the queue depth of the workload queue is set to the medium level, and the waiting time for each workload queue is set to the medium delay.

在另一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、该接收的延迟时间值不接近或低于性能保证值、该读取/写入比率大于或等于1，且该储存块大小小于该中等大小，则该工作负载队列的队列深度设定为深层次，且对每一工作负载队列的等待时间设定为长延时。In another example, if the received input/output operations per second or throughput value is close to the respective performance adjustment value, the received latency value is not close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is less than the medium size, then the queue depth of the workload queue is set to deep level, and the waiting time for each workload queue is set to long delay.

在另一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、该接收的延迟时间值不接近或低于性能保证值、该读取/写入比率小于1，且该储存块大小大于或等于该中等大小，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为短延时。In another example, if the received input/output operations per second or throughput value is close to the respective performance adjustment value, the received latency value is not close to or lower than the performance guarantee value, the read/write ratio is less than 1, and the storage block size is greater than or equal to the medium size, then the queue depth of the workload queue is set to the medium level, and the waiting time for each workload queue is set to short latency.

在另一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、该接收的延迟时间值不接近或低于性能保证值、该读取/写入比率小于1，且该储存块大小小于该中等大小，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为中等延时。In another example, if the received input/output operations per second or throughput value is close to the respective performance adjustment value, the received latency value is not close to or lower than the performance guarantee value, the read/write ratio is less than 1, and the storage block size is less than the medium size, then the queue depth of the workload queue is set to the medium level, and the waiting time for each workload queue is set to the medium delay.

在另一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、该接收的延迟时间值不接近或低于性能保证值，且该储存块大小大于或等于该中等大小，则该工作负载队列的队列深度设定为浅层次，且对每一工作负载队列的等待时间设定为短延时。In another example, if the received input/output operations per second or throughput value is close to or lower than the respective performance guarantee value, the received latency value is not close to or lower than the performance guarantee value, and the storage block size is greater than or equal to the medium size, then the queue depth of the workload queue is set to shallow, and the waiting time for each workload queue is set to short delay.

在另一例子中，如果该接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、该接收的延迟时间值不接近或低于性能保证值，且该储存块大小小于该中等大小，则该工作负载队列的队列深度设定为浅层次，且对每一工作负载队列的等待时间设定为中等延时。In another example, if the received input/output operations per second or throughput value is close to or lower than the respective performance guarantee values, the received latency value is not close to or lower than the performance guarantee value, and the storage block size is smaller than the medium size, then the queue depth of the workload queue is set to shallow, and the waiting time for each workload queue is set to medium delay.

在另一例子中，如果该固态硬盘命中率增加，则该工作负载队列的队列深度维持相同或变得更浅，且对每一工作负载队列的等待时间维持相同或变得更短，否则该工作负载队列的队列深度维持相同或变得更深，且对每一工作负载队列的等待时间维持相同或变得较长。In another example, if the SSD hit rate increases, the queue depth of the workload queue remains the same or becomes shallower, and the waiting time for each workload queue remains the same or becomes shorter; otherwise, the queue depth of the workload queue remains the same or becomes deeper, and the waiting time for each workload queue remains the same or becomes longer.

在另一例子中，如果该合并比率增加，则该工作负载队列的队列深度维持相同或变得更浅，且对每一工作负载队列的等待时间维持相同或变得更短，否则该工作负载队列的队列深度维持相同或变得更深，且对每一工作负载队列的等待时间维持相同或变得较长。In another example, if the merge ratio increases, the queue depth of the workload queue remains the same or becomes shallower, and the waiting time for each workload queue remains the same or becomes shorter, otherwise the queue depth of the workload queue remains the same or becomes deeper, and the waiting time for each workload queue remains the same or becomes longer.

较佳的，该中等大小为8KB。Preferably, the median size is 8KB.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明软件定义混合储存系统中工作负载感知的输入输出调度器的储存系统的结构示意图；FIG1 is a schematic diagram of the structure of a storage system of a workload-aware input/output scheduler in a software-defined hybrid storage system according to the present invention;

图2为一请求控制子模块与一请求调度子模块的操作示意图；FIG2 is a schematic diagram of the operation of a request control submodule and a request scheduling submodule;

图3为输入输出调度器工作负载的完整情况。Figure 3 shows the complete picture of the I/O scheduler workload.

附图标记说明：1-远程用户；5-云端储存系统；10-调度器；50-主机；55-文件系统；60-设备驱动器模块；61-固态硬盘；62-固态硬盘；63-硬盘；64-硬盘；100-队列管理模块；101-请求接收子模块；102-请求控制子模块；103-请求调度子模块；110-工作负载特性数据库；120-流量监控模块；130-流量模型模块；W1-工作负载队列群组；W2-工作负载队列群组；Wn-工作负载队列群组；QW1-队列深度；QW2-队列深度；QWn-队列深度；QS1-第一固态硬盘设备队列；QS2-第二固态硬盘设备队列；QH1-第一硬盘设备队列；QH2-第二硬盘设备队列。Explanation of the accompanying drawings: 1-remote user; 5-cloud storage system; 10-scheduler; 50-host; 55-file system; 60-device driver module; 61-solid state drive; 62-solid state drive; 63-hard disk; 64-hard disk; 100-queue management module; 101-request receiving submodule; 102-request control submodule; 103-request scheduling submodule; 110-workload characteristic database; 120-traffic monitoring module; 130-traffic model module; W1-workload queue group; W2-workload queue group; Wn-workload queue group; QW1-queue depth; QW2-queue depth; QWn-queue depth; QS1-first solid state drive device queue; QS2-second solid state drive device queue; QH1-first hard disk device queue; QH2-second hard disk device queue.

具体实施方式DETAILED DESCRIPTION

本发明将参照下列实施方式而更具体地描述。The present invention will be described more specifically with reference to the following embodiments.

如图1所示，其描述本发明的一实施例。显示于图1中的一云端储存系统5为一软件定义混合储存系统此软件定义混合储存使得在云端架构下，能够基于原则的快速创建、移除及管理储存系统。软件定义混合储存通常以储存虚拟化的方式来将软件从其管理的储存硬件中分开。不同于一般的软件定义储存，软件定义混合储存系统包含一种以上的磁盘型态。它通常由两种磁盘构成，例如硬盘与固态硬盘，用于满足不同特征工作负载的需求。从而，云端储存系统5也能应用于许多的工作负载。这些工作负载可以是一在线事务处理数据库、视频流服务、虚拟桌面基础架构环境、电子邮件服务器、备份服务器或档案服务器。所有利用软件定义混合储存为储存单元的服务或架构，都是依照本发明的精神的工作负载。As shown in Figure 1, it describes an embodiment of the present invention. The cloud storage system 5 shown in Figure 1 is a software-defined hybrid storage system. This software-defined hybrid storage enables the rapid creation, removal and management of storage systems based on principles under the cloud architecture. Software-defined hybrid storage usually separates software from the storage hardware it manages in the form of storage virtualization. Unlike general software-defined storage, the software-defined hybrid storage system includes more than one disk type. It is usually composed of two types of disks, such as hard disks and solid-state drives, to meet the needs of workloads with different characteristics. Therefore, the cloud storage system 5 can also be applied to many workloads. These workloads can be an online transaction processing database, a video streaming service, a virtual desktop infrastructure environment, an email server, a backup server or a file server. All services or architectures that use software-defined hybrid storage as a storage unit are workloads in accordance with the spirit of the present invention.

云端储存系统5接收来自网络中许多用户的读取或写入请求。为了简化本发明的说明，仅一远程用户1用于代表所有的用户。基本上，云端储存系统5的输入/输出作业对所有用户来说都一样，但是在不同的情况下，对于每个工作负载的处理优先次序及响应延时不同。当阅读本实施例时，应当考虑有许多远程用户1发出读取/写入请求给云端储存系统5，并同时等待响应。Cloud storage system 5 receives read or write requests from many users on the network. To simplify the description of this invention, a single remote user 1 is used to represent all users. Fundamentally, the input/output operations of cloud storage system 5 are the same for all users, but the processing priority and response latency for each workload vary in different situations. When reading this embodiment, it should be considered that many remote users 1 are issuing read/write requests to cloud storage system 5 and simultaneously awaiting responses.

一工作负载感知的输入/输出调度器10装设于云端储存系统5中，它是本发明的重要部分。因为能区分请求来源的工作负载，调度器10是对工作负载感知的。调度器10包括一队列管理模块100、一工作负载特性数据库110、一流量监控模块120与一流量模型模块130。每一者将于后文中详细说明。要注意的是输入/输出调度器10能以软件的形式执行于云端储存系统5的一主机50上，它也可以是为主机50内云端储存系统5工作的硬设备。或者，输入/输出调度器10能部分由硬件实现，而其余部分由软件驱动，这不为本发明所限制。A workload-aware input/output scheduler 10 is installed in the cloud storage system 5 and is an important part of the present invention. The scheduler 10 is workload-aware because it can distinguish the workload of the request source. The scheduler 10 includes a queue management module 100, a workload characteristic database 110, a traffic monitoring module 120, and a traffic model module 130. Each of these will be described in detail later. It should be noted that the input/output scheduler 10 can be executed in the form of software on a host 50 of the cloud storage system 5, or it can be a hardware device that works for the cloud storage system 5 within the host 50. Alternatively, the input/output scheduler 10 can be partially implemented by hardware, while the rest is driven by software, which is not limited by the present invention.

队列管理模块100的主要功能是管理队列、读取请求、及写入请求，前述的读取请求与写入请求来自主机50的一文件系统55。队列管理模块100包括三个重要的子模块。它们分别是一请求接收子模块101、一请求控制子模块102与一请求调度子模块103。请求接收子模块101能暂时地储存来自文件系统55的读取请求与写入请求。该请求之后将依据工作负载而分类，并接着安排到一工作负载队列群组的写入队列与读取队列中。该写入队列为某一特定的工作负载排序写入请求，该读取队列为相同的工作负载依序保持读取请求。要再强调有许多的工作负载同时使用云端储存系统5，同样也有许多工作负载队列群组W1至Wn显示于图2中。The main function of the queue management module 100 is to manage queues, read requests, and write requests. The aforementioned read requests and write requests originate from a file system 55 of the host 50. The queue management module 100 includes three important submodules: a request receiving submodule 101, a request control submodule 102, and a request scheduling submodule 103. The request receiving submodule 101 temporarily stores read and write requests from the file system 55. These requests are then classified according to workload and assigned to a write queue and a read queue within a workload queue group. The write queue sorts write requests for a specific workload, while the read queue sequentially stores read requests for the same workload. It should be emphasized that many workloads use the cloud storage system 5 simultaneously, and there are also many workload queue groups W1 to Wn shown in FIG. 2 .

请求控制子模块102用来对每一工作负载的请求，创建工作负载队列(写入队列与读取队列)。在工作负载队列设立之后，请求控制子模块102能进一步依照一调度器配置功能，动态配置工作负载队列。该调度器的配置功能会运作来决定工作负载队列的深度及每一请求等待合并机会的时间长度，这将在后面详细地描述。因此，请求控制子模块102安排读取请求与写入请求到相关的工作负载队列中。The request control submodule 102 is responsible for creating workload queues (write queues and read queues) for each workload request. After the workload queues are established, the request control submodule 102 can further dynamically configure the workload queues according to a scheduler configuration function. The scheduler configuration function determines the depth of the workload queues and the length of time each request waits for a merge opportunity, which will be described in detail later. Therefore, the request control submodule 102 schedules read and write requests into the corresponding workload queues.

请求调度子模块103负责创建设备队列。每一设备队列包含对特定设备，即固态硬盘61、62或硬盘63、64，的请求(读取及/或写入)。该请求可来自一工作负载队列群组或某些工作负载队列群组。请求调度子模块103将调度每一来自工作负载队列的读取请求或写入请求到一特定的设备队列中。如果其它已队列的请求在一请求之前已进行处理，该请求将会接着进行处理。在某些主机50设计中，调度可经由一设备驱动器模块60触发对应储存设备的驱动器而达成。该请求调度子模块103提供一种其它调度器无法达成的绝佳功能，它将到硬盘的请求与到固态硬盘的请求分开。从而，到固态硬盘的请求能直接执行，不必与其它到硬盘的请求在队列中等待，云端储存系统5的性能可因此提升。The request scheduling submodule 103 is responsible for creating device queues. Each device queue contains requests (read and/or write) for a specific device, namely, solid-state drives 61, 62 or hard disks 63, 64. The requests can come from a workload queue group or certain workload queue groups. The request scheduling submodule 103 schedules each read request or write request from a workload queue to a specific device queue. If other queued requests have been processed before a request, the request will be processed next. In some host 50 designs, scheduling can be achieved by triggering the driver of the corresponding storage device through a device driver module 60. The request scheduling submodule 103 provides an excellent function that other schedulers cannot achieve. It separates requests to the hard disk from requests to the solid-state drive. As a result, requests to the solid-state drive can be executed directly without having to wait in the queue with other requests to the hard disk, and the performance of the cloud storage system 5 can be improved.

工作负载特性数据库110储存各工作负载的特性，以供存取之用。该些特性是调度器配置功能用来决定队列深度与等待时间的因子。依照本发明，该些特性是读取/写入比率、固态硬盘命中率、合并比率及储存块大小。读取/写入比率是工作负载读取请求数量对写入请求数量的比值。对一工作负载来说，它通常具有一特定的使用模式，读取请求数量多于写入请求数量或写入请求数量超过读取请求数量。对某些特殊情况，某一工作负载，比如一个备份服务器，可能具有写入请求但没有读取请求。The workload characteristics database 110 stores the characteristics of each workload for access purposes. These characteristics are factors used by the scheduler configuration function to determine the queue depth and waiting time. According to the present invention, these characteristics are the read/write ratio, the solid-state drive hit rate, the merge ratio, and the storage block size. The read/write ratio is the ratio of the number of read requests to the number of write requests for a workload. For a workload, it usually has a specific usage pattern, with the number of read requests being greater than the number of write requests or the number of write requests being greater than the number of read requests. For some special cases, a workload, such as a backup server, may have write requests but no read requests.

固态硬盘命中率指的是云端储存系统5中的固态硬盘被存取的频率，不论该些请求是由哪一工作负载而来。使用于每一工作负载的固态硬盘的数量或空间可能依据工作负载使用情况而变。从而，用于每一工作负载的固态硬盘命中率随时在变。如果该固态硬盘命中率增加，则该工作负载队列的队列深度可能维持相同或变得更浅，且对每一工作负载队列的等待时间可能维持相同或变得更短。反之，该工作负载队列的队列深度可能维持相同或变得更深，且对每一工作负载队列的等待时间可能维持相同或变得较长。维持或改变队列深度或等待时间是基于增加/减少的程度而定。维持与改变间的门坎值，及改变程度的量值能在云端储存系统5上线运作前就设定好。前述的方法是一种用于制定规则的准则，"浅"、"短"、"深"，及"长"等量化描述将在随后说明。对比的使用是针对现有级别而说明其增加或减少。The SSD hit rate refers to the frequency with which the SSDs in the cloud storage system 5 are accessed, regardless of the workload from which these requests originate. The number or space of SSDs used for each workload may vary depending on workload usage. Therefore, the SSD hit rate for each workload is constantly changing. If the SSD hit rate increases, the queue depth of the workload queue may remain the same or become shallower, and the wait time for each workload queue may remain the same or become shorter. Conversely, the queue depth of the workload queue may remain the same or become deeper, and the wait time for each workload queue may remain the same or become longer. Maintaining or changing the queue depth or wait time depends on the degree of increase/decrease. The threshold between maintaining and changing, as well as the magnitude of the change, can be set before the cloud storage system 5 goes live. The aforementioned method is a criterion for formulating rules, and quantitative descriptions such as "shallow," "short," "deep," and "long" will be explained later. Comparisons are used to illustrate increases or decreases relative to existing levels.

读取/写入比率是表达读取/写入模式的一种指针。合并比率仅计算硬盘的存取，它是一读取写入头在相同移动间，与其它请求合并处理(处理优先级的改变)的请求的比例。举例而言，在一固定时限开始(ta)时，有10个硬盘读取请求在队列中。在该时限结束时(tb)，请求合并造成队列中的请求数量变成7个，该合并比率在tb时为0.3，它来自下面的公式：The read/write ratio is an indicator of the read/write pattern. The merge ratio only counts accesses to the hard drive. It is the ratio of requests that are merged with other requests (a change in processing priority) during the same movement of the read/write head. For example, at the beginning of a fixed time limit (ta), there are 10 hard drive read requests in the queue. At the end of the time limit (tb), the requests are merged, resulting in 7 requests in the queue. The merge ratio at tb is 0.3, which is derived from the following formula:

其中t_b>t_a where t _b > t _a

r(t)表示在时间t时，一队列中请求获得数量的函数。合并比率越高，硬盘的流通量越大(本例显示大于30％)。然而，如果该请求等的够长而获致最大的流通量，延迟时间也可能变得较长，而较长的延迟时间会使得使用者经验变差。对一工作负载而言，它的合并比率也可能随着时间而变化。如果合并比率增加，该工作负载队列的队列深度可能维持相同或变得较浅，且对每一工作负载队列的等待时间可能维持相同或变得较短。否则该工作负载队列的队列深度可能维持相同或变得更深，且对每一工作负载队列的等待时间可能维持相同或变得更长。相似地，"浅"、"短"、"深"，及"长"等量化描述将在随后说明。对比的使用是针对现有级别而说明其增加或减少。r(t) represents the function of the number of requests in a queue at time t. The higher the merge ratio, the greater the throughput of the disk (in this example, it is greater than 30%). However, if the request waits long enough to obtain maximum throughput, the latency may become longer, and longer latency may result in a poor user experience. For a workload, its merge ratio may also change over time. If the merge ratio increases, the queue depth of the workload queue may remain the same or become shallower, and the wait time for each workload queue may remain the same or become shorter. Otherwise, the queue depth of the workload queue may remain the same or become deeper, and the wait time for each workload queue may remain the same or become longer. Similarly, quantitative descriptions such as "shallow," "short," "deep," and "long" will be explained later. Comparisons are used to illustrate increases or decreases relative to existing levels.

储存块大小是储存装置的基本储存单元空间，硬盘或固态硬盘都有，它从4KB或更小到16MB或以上都有可能。一般来说，中等大小使用上以8KB较常见。随着储存设备科技的演进，该中等大小将会增加。为了说明的目的，在本说明书中，8KB设定为该中等大小，然而这并不限制本发明。对不同的储存块大小与状况而言，调度器配置功能用来决定队列深度与等待时间的策略不同，以下将说明。The block size is the basic unit of storage space in a storage device, whether a hard drive or solid-state drive. It can range from 4KB or less to 16MB or more. Generally speaking, 8KB is the most common medium-sized block size. As storage device technology evolves, this medium-sized block size will increase. For illustrative purposes, this specification uses 8KB as the medium-sized block size; however, this does not limit the present invention. The scheduler configuration function uses different strategies for determining queue depth and wait time for different block sizes and conditions, as explained below.

流量监控模块120能监控及持续记录云端储存系统5的一性能参数的值，它也能提供该性能参数的值给请求控制子模块102。性能参数可以是每秒输入输出操作次数、流通量、延迟时间或前述三者的组合。流量模型模块130用来模型化来自工作负载请求的储存流量及在未来的一特定的时间点上，提供预测某些特性的储存流量。模型化使用的数据来自流量监控模块120。任何适合的方法、算法或模块都可以应用，最好使用由相同发明人于美国专利申请案第14/290,533号中所提供的一种储存设流量模型，可由该申请案获得相同技术之共同参考。来自工作负载请求之模型化的储存流量，以及储存流量的预测值，能提供请求控制子模块102参考，以备未来工作负载队列及等待时间的组态之用。The traffic monitoring module 120 can monitor and continuously record the value of a performance parameter of the cloud storage system 5, and can also provide the value of the performance parameter to the request control submodule 102. The performance parameter can be the number of input and output operations per second, throughput, latency, or a combination of the three. The traffic modeling module 130 is used to model the storage traffic from workload requests and provide a prediction of storage traffic with certain characteristics at a specific time point in the future. The data used for modeling comes from the traffic monitoring module 120. Any suitable method, algorithm, or module can be applied, and it is preferred to use a storage device traffic model provided by the same inventor in U.S. patent application No. 14/290,533, which can be used as a common reference for the same technology. The modeled storage traffic from workload requests and the predicted value of the storage traffic can be provided to the request control submodule 102 for reference in preparing for the configuration of future workload queues and waiting times.

以下，调度器配置功能的运作将以所有状况的例子来说明。调度器配置功能基于工作负载特性数据库提供的特性与接收的性能参数值，计算队列深度及对每一工作负载队列的等待时间。因此，调度器配置功能可调整云端储存系统5的性能参数值，以便在未来落于设定给该性能参数的一性能保证值与一性能调节值之间。前述的性能保证值与性能调节值由工作负载的服务层级协议或服务质量需求所定义。每一工作负载队列被归类为深层次、中等层次或浅层次。每一等待时间被归类为长延时、中等延时或短延时。The following describes the operation of the scheduler configuration function using examples of all scenarios. The scheduler configuration function calculates the queue depth and wait time for each workload queue based on the characteristics provided by the workload characteristics database and the received performance parameter values. Therefore, the scheduler configuration function can adjust the performance parameter value of the cloud storage system 5 so that it falls between a performance guarantee value and a performance adjustment value set for the performance parameter in the future. The aforementioned performance guarantee value and performance adjustment value are defined by the workload's service level agreement or quality of service requirements. Each workload queue is categorized as deep, medium, or shallow. Each wait time is categorized as long, medium, or short latency.

依照本发明的精神，对每一分类的工作负载队列而言，如同在运作时被分类的等待时间，没有绝对的定界符。一个指引方针是深层次工作负载队列的队列深度应比中等层次工作负载队列的队列深度，容纳更多读取请求或写入请求；中等层次工作负载队列的队列深度比浅层次工作负载队列的队列深度，容纳更多读取请求或写入请求；储存块的大小为中等大小。相似地，长延时较中等延时长，中等延时较短延时长。According to the present invention, there are no absolute delimiters for each workload queue classification, just as there are categorized wait times during operation. A guideline is that the queue depth of a deep workload queue should accommodate more read or write requests than the queue depth of a medium workload queue; the queue depth of a medium workload queue should accommodate more read or write requests than the queue depth of a shallow workload queue; and the storage block size should be medium. Similarly, long latency is longer than medium latency, which in turn is longer than short latency.

如上所述，有许多的工作负载状况影响调度器配置功能决定队列深度与等待时间，所有的状况揭露如下。在一实施例中，云端储存系统5支持视频流、在线事务处理数据库与邮件服务器。这些工作负载仅用于说明，并不限定本发明的应用，不同于上述三者的工作负载都能使用。视频流是一种序列输入/输出型态，具有的读取请求数量较写入请求数量多，需要储存块大小大于或等于8K，产生每秒输入输出操作次数或流通量的值接近各自的性能调节值，且具有的延迟时间接近或低于延迟时间的性能保证值。如图2所示，工作负载队列群组W1中工作负载队列的一队列深度QW1设定为中等层次(5)，对每一工作负载队列的等待时间设定为短延时(20ms)。请求调度子模块103安排来自工作负载队列群组W1的请求到用于固态硬盘61的一第一固态硬盘设备队列QS1及用于硬盘63的一第一硬盘设备队列QH1。As mentioned above, there are many workload conditions that affect the scheduler configuration function to determine the queue depth and wait time. All conditions are disclosed below. In one embodiment, the cloud storage system 5 supports video streaming, online transaction processing databases, and mail servers. These workloads are for illustration only and do not limit the application of the present invention. Workloads other than the above three can be used. Video streaming is a serial input/output type with a larger number of read requests than write requests, requiring a storage block size greater than or equal to 8K, generating input/output operations per second or throughput values close to their respective performance adjustment values, and having a latency close to or lower than the performance guarantee value of the latency. As shown in Figure 2, a queue depth QW1 of the workload queue in the workload queue group W1 is set to a medium level (5), and the latency for each workload queue is set to a short latency (20ms). The request scheduling submodule 103 arranges requests from the workload queue group W1 to a first solid-state drive device queue QS1 for the solid-state drive 61 and a first hard disk device queue QH1 for the hard disk 63.

在线事务处理数据库是随机输入/输出型态，具有相同数量的读取/写入请求或读取请求数量多于写入请求数量，需要储存块大小大于或等于8K，产生每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值，且具有的延迟时间接近或低于延迟时间的性能保证值。工作负载队列群组W2中工作负载队列的一队列深度QW2设定为浅层次(2)，对每一工作负载队列的等待时间设定为短延时(20ms)。请求调度子模块103安排来自工作负载队列群组W2的请求到用于固态硬盘61的第一固态硬盘设备队列QS1及用于固态硬盘62的一第二固态硬盘设备队列QS2。The online transaction processing database is a random input/output type, with the same number of read/write requests or more read requests than write requests, requiring a storage block size greater than or equal to 8K, generating input/output operations per second or throughput values close to or lower than their respective performance guarantees, and having a latency close to or lower than the performance guarantee value of the latency. A queue depth QW2 of the workload queue in the workload queue group W2 is set to a shallow level (2), and the waiting time for each workload queue is set to a short delay (20ms). The request scheduling submodule 103 arranges requests from the workload queue group W2 to a first solid-state drive device queue QS1 for the solid-state drive 61 and a second solid-state drive device queue QS2 for the solid-state drive 62.

邮件服务器是随机输入/输出型态，具有的读取请求数量较写入请求数量多，(或甚至是相同)，需要储存块大小小于8K，产生每秒输入输出操作次数或流通量的值接近各自的性能调节值，且具有的延迟时间并不接近或低于延迟时间的性能保证值。工作负载队列群组Wn中工作负载队列的一队列深度QWn设定为深层次(7)，对每一工作负载队列的等待时间设定为长延时(100ms)。请求调度子模块103安排来自工作负载队列群组Wn的请求到用于硬盘63的第一硬盘设备队列QH1及用于硬盘64的第二硬盘设备队列QH2。The mail server is a random input/output type, with a larger number of read requests than write requests (or even the same number), requiring a storage block size of less than 8K, generating input/output operations per second or throughput values close to their respective performance adjustment values, and having a latency that is not close to or lower than the performance guarantee value of the latency. A queue depth QWn of the workload queues in the workload queue group Wn is set to a deep level (7), and the waiting time for each workload queue is set to a long delay (100ms). The request scheduling submodule 103 arranges requests from the workload queue group Wn to the first hard disk device queue QH1 for hard disk 63 and the second hard disk device queue QH2 for hard disk 64.

工作负载状况的完整组合表列于图3中。很明显地，视频流、在线事务处理数据库、及邮件服务器各别是第1号、第5号与第10号状况。其余的状况说明如下。The complete combination of workload scenarios is shown in Figure 3. Clearly, video streaming, online transaction processing database, and mail server are scenarios 1, 5, and 10, respectively. The remaining scenarios are described below.

对第2号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、接收的延迟时间值接近或低于性能保证值、读取/写入比率大于或等于1，且储存块大小小于中等大小，则该工作负载队列的队列深度设定为深层次，且对每一工作负载队列的等待时间设定为中等延时。For condition No. 2, if the received input/output operations per second or throughput value is close to the respective performance adjustment value, the received latency value is close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is less than the medium size, then the queue depth of the workload queue is set to deep level, and the waiting time for each workload queue is set to medium latency.

对第3号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、接收的延迟时间值接近或低于性能保证值、读取/写入比率小于1，且储存块大小大于或等于中等大小，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为短延时。For condition No. 3, if the received input/output operations per second or throughput values are close to the respective performance adjustment values, the received latency values are close to or lower than the performance guarantee values, the read/write ratio is less than 1, and the storage block size is greater than or equal to medium size, then the queue depth of the workload queue is set to medium level, and the waiting time for each workload queue is set to short latency.

对第4号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、接收的延迟时间值接近或低于性能保证值、读取/写入比率小于1，且储存块大小小于中等大小，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为短延时。For condition No. 4, if the received input/output operations per second or throughput values are close to the respective performance adjustment values, the received latency values are close to or lower than the performance guarantee values, the read/write ratio is less than 1, and the storage block size is less than medium, then the queue depth of the workload queue is set to medium level, and the waiting time for each workload queue is set to short latency.

对第6号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、接收的延迟时间值接近或低于性能保证值、读取/写入比率大于或等于1，且储存块大小小于中等大小，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为中等延时。For condition No. 6, if the received input/output operations per second or throughput value is close to or lower than the respective performance guarantee value, the received latency value is close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is less than the medium size, then the queue depth of the workload queue is set to the medium level, and the waiting time for each workload queue is set to medium latency.

对第7号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、接收的延迟时间值接近或低于性能保证值、读取/写入比率小于1，且储存块大小大于或等于中等大小，则该工作负载队列的队列深度设定为浅层次，且对每一工作负载队列的等待时间设定为短延时。For condition No. 7, if the received input/output operations per second or throughput value is close to or lower than the respective performance guarantee value, the received latency value is close to or lower than the performance guarantee value, the read/write ratio is less than 1, and the storage block size is greater than or equal to medium, then the queue depth of the workload queue is set to shallow and the waiting time for each workload queue is set to short latency.

对第8号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、接收的延迟时间值接近或低于性能保证值、读取/写入比率小于1，且储存块大小小于中等大小，则该工作负载队列的队列深度设定为浅层次，且对每一工作负载队列的等待时间设定为短延时。For condition No. 8, if the received input/output operations per second or throughput values are close to or lower than the respective performance guarantee values, the received latency values are close to or lower than the performance guarantee values, the read/write ratio is less than 1, and the storage block size is less than medium, then the queue depth of the workload queue is set to shallow and the waiting time for each workload queue is set to short latency.

对第9号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、接收的延迟时间值不接近或低于性能保证值、读取/写入比率大于或等于1，且储存块大小大于或等于中等大小，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为中等延时。For condition No. 9, if the received values of the number of input and output operations per second or throughput are close to the respective performance adjustment values, the received latency values are not close to or lower than the performance guarantee values, the read/write ratio is greater than or equal to 1, and the storage block size is greater than or equal to medium size, then the queue depth of the workload queue is set to medium level, and the waiting time for each workload queue is set to medium latency.

对第11号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、接收的延迟时间值不接近或低于性能保证值、读取/写入比率小于1，且储存块大小大于或等于中等大小，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为短延时。For condition No. 11, if the received values of the number of input and output operations per second or throughput are close to the respective performance adjustment values, the received latency values are not close to or lower than the performance guarantee values, the read/write ratio is less than 1, and the storage block size is greater than or equal to medium size, then the queue depth of the workload queue is set to medium level, and the waiting time for each workload queue is set to short latency.

对第12号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近各自的性能调节值、接收的延迟时间值不接近或低于性能保证值、读取/写入比率小于1，且储存块大小小于中等大小，则该工作负载队列的队列深度设定为中等层次，且对每一工作负载队列的等待时间设定为中等延时。For condition No. 12, if the received values of the number of input and output operations per second or throughput are close to the respective performance adjustment values, the received latency values are not close to or lower than the performance guarantee values, the read/write ratio is less than 1, and the storage block size is less than the medium size, then the queue depth of the workload queue is set to the medium level, and the waiting time for each workload queue is set to the medium latency.

对第13号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、接收的延迟时间值不接近或低于性能保证值、读取/写入比率大于或等于1，且储存块大小大于或等于中等大小，则该工作负载队列的队列深度设定为浅层次，且对每一工作负载队列的等待时间设定为短延时。For condition No. 13, if the received input/output operations per second or throughput value is close to or lower than the respective performance guarantee value, the received latency value is not close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is greater than or equal to medium, then the queue depth of the workload queue is set to shallow and the waiting time for each workload queue is set to short latency.

对第14号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、接收的延迟时间值不接近或低于性能保证值、读取/写入比率大于或等于1，且储存块大小小于中等大小，则该工作负载队列的队列深度设定为浅层次，且对每一工作负载队列的等待时间设定为中等延时。For condition No. 14, if the received I/O operations per second or throughput values are close to or below their respective performance guarantees, the received latency values are not close to or below their respective performance guarantees, the read/write ratio is greater than or equal to 1, and the storage block size is less than medium, then the queue depth of the workload queue is set to shallow and the waiting time for each workload queue is set to medium latency.

对第15号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、接收的延迟时间值不接近或低于性能保证值、读取/写入比率小于1，且储存块大小大于或等于中等大小，则该工作负载队列的队列深度设定为浅层次，且对每一工作负载队列的等待时间设定为短延时。For condition No. 15, if the received values of I/O operations per second or throughput are close to or lower than the respective performance guarantee values, the received latency values are not close to or lower than the performance guarantee values, the read/write ratio is less than 1, and the storage block size is greater than or equal to medium, then the queue depth of the workload queue is set to shallow and the waiting time for each workload queue is set to short latency.

对第16号状况来说，如果接收的每秒输入输出操作次数或流通量的值接近或低于各自的性能保证值、接收的延迟时间值不接近或低于性能保证值、读取/写入比率小于1，且储存块大小小于中等大小，则该工作负载队列的队列深度设定为浅层次，且对每一工作负载队列的等待时间设定为中等延时。For condition 16, if the received I/O operations per second or throughput values are close to or below their respective performance guarantees, the received latency values are not close to or below their respective performance guarantees, the read/write ratio is less than 1, and the storage block size is less than medium, then the queue depth of the workload queue is set to shallow and the waiting time for each workload queue is set to medium latency.

虽然本发明已以实施方式揭露如上，然其并非用以限定本发明，任何所属技术领域中具有通常知识者，在不脱离本发明的精神和范围内，当可作些许之更动与润饰，因此本发明的保护范围当视本案权利要求范围所界定为准。Although the present invention has been disclosed above in terms of embodiments, they are not intended to limit the present invention. Anyone with ordinary knowledge in the technical field may make slight changes and modifications without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be determined by the scope of the claims of this case.

Claims

1. A workload-aware input/output scheduler in a software-defined hybrid storage system, the software-defined hybrid storage system comprising at least one hard disk and one solid-state drive, characterized in that the workload-aware input/output scheduler in the software-defined hybrid storage system includes a queue management module, a workload characteristic database, and a traffic monitoring module, wherein:

This queue management module manages queues, read requests, and write requests. It includes a request receiving submodule, a request control submodule, and a request scheduling submodule.

The request receiving submodule is used to temporarily store these read and write requests;

The request control submodule is used to create workload queues, dynamically configure these workload queues according to a scheduler configuration function, and schedule read and write requests to these workload queues;

This request scheduling submodule is used to create device queues and schedule each read or write request from the workload queue to a specific device queue;

This workload characteristics database is used to store the characteristics of workloads for access.

The traffic monitoring module is used to monitor and continuously record the value of a performance parameter of the software-defined hybrid storage system, and provide the value of these performance parameters to the request control submodule.

The scheduler configuration function calculates a queue depth and a waiting time for each workload queue based on the characteristics provided by the workload characteristic database and the received performance parameter values, in order to adjust the performance parameter values of the software-defined hybrid storage system so that they fall between a performance guarantee value and a performance adjustment value set for the performance parameters in the future.

2. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 1, characterized in that it further comprises a traffic model module for modeling the storage traffic of the requests from the workload and providing a predicted storage traffic of the characteristics at a specific future point in time.

3. The workload-aware input/output scheduler in the software-defined hybrid storage system according to claim 1, wherein the characteristics are read/write ratio, merge ratio, solid-state drive hit rate, and storage block size.

4. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 3, wherein the performance parameter is the number of input/output operations per second, throughput, latency, or a combination of the three.

5. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 3, wherein the performance guarantee value and the performance adjustment value are defined by the service level protocol or quality of service requirements of the workload.

6. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 4, characterized in that each workload queue is classified as deep, medium, or shallow, and each latency is classified as long latency, medium latency, or short latency, wherein the queue depth of the deep workload queue accommodates more read or write requests than the queue depth of the medium workload queue; the queue depth of the medium workload queue accommodates more read or write requests than the queue depth of the shallow workload queue; the storage block size is a medium size; the long latency is longer than the medium latency; and the medium latency is longer than the short latency.

7. The workload-aware input/output scheduler in the software-defined hybrid storage system according to claim 6, characterized in that, if the received input/output operations per second or throughput value is close to its respective performance adjustment value, the received latency value is close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is greater than or equal to the medium size, then the queue depth of the workload queue is set to a medium level, and the waiting time for each workload queue is set to a short latency.

8. The workload-aware input/output scheduler in the software-defined hybrid storage system according to claim 6, characterized in that, if the received input/output operations per second or throughput value is close to its respective performance adjustment value, the received latency value is close to or lower than the guaranteed value, the read/write ratio is greater than or equal to 1, and the storage block size is less than the medium size, then the queue depth of the workload queue is set to deep, and the waiting time for each workload queue is set to medium latency.

9. The workload-aware input/output scheduler in the software-defined hybrid storage system according to claim 6, characterized in that, if the received input/output operations per second or throughput value is close to its respective performance adjustment value, the received latency value is close to or lower than the performance guarantee value, and the read/write ratio is less than 1, then the queue depth of the workload queue is set to a medium level, and the waiting time for each workload queue is set to a short latency.

10. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 6, characterized in that, if the received input/output operations per second or throughput value is close to or lower than its respective performance guarantee value, the received latency value is close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is greater than or equal to the medium size, then the queue depth of the workload queue is set to shallow, and the waiting time for each workload queue is set to short latency.

11. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 6, characterized in that, if the received input/output operations per second or throughput value is close to or lower than its respective performance guarantee value, the received latency value is close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is less than the medium size, then the queue depth of the workload queue is set to a medium level, and the waiting time for each workload queue is set to a medium latency.

12. The workload-aware input/output scheduler in the software-defined hybrid storage system according to claim 6, characterized in that, if the received input/output operation count or throughput per second is close to or lower than its respective performance guarantee value, the received latency value is close to or lower than the performance guarantee value, and the read/write ratio is less than 1, then the queue depth of the workload queue is set to shallow level, and the waiting time for each workload queue is set to short latency.

13. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 6, characterized in that, if the received input/output operations per second or throughput value is close to its respective performance adjustment value, the received latency value is not close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is greater than or equal to the medium size, then the queue depth of the workload queue is set to a medium level, and the waiting time for each workload queue is set to a medium delay.

14. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 6, characterized in that, if the received input/output operations per second or throughput value is close to its respective performance adjustment value, the received latency value is not close to or lower than the performance guarantee value, the read/write ratio is greater than or equal to 1, and the storage block size is smaller than the medium size, then the queue depth of the workload queue is set to deep, and the waiting time for each workload queue is set to long latency.

15. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 6, characterized in that, if the received input/output operations per second or throughput value is close to its respective performance adjustment value, the received latency value is not close to or lower than the performance guarantee value, the read/write ratio is less than 1, and the storage block size is greater than or equal to the medium size, then the queue depth of the workload queue is set to a medium level, and the waiting time for each workload queue is set to a short latency.

16. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 6, characterized in that, if the received input/output operations per second or throughput value is close to its respective performance adjustment value, the received latency value is not close to or lower than the performance guarantee value, the read/write ratio is less than 1, and the storage block size is less than the medium size, then the queue depth of the workload queue is set to a medium level, and the waiting time for each workload queue is set to a medium delay.

17. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 6, characterized in that, if the received value of the number of input/output operations per second or the throughput is close to or lower than their respective performance guarantee values, the received latency value is not close to or lower than the performance guarantee values, and the storage block size is greater than or equal to the medium size, then the queue depth of the workload queue is set to shallow, and the waiting time for each workload queue is set to short latency.

18. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 6, characterized in that, if the received value of the number of input/output operations per second or the throughput is close to or lower than their respective performance guarantee values, the received latency value is not close to or lower than the performance guarantee value, and the storage block size is smaller than the medium size, then the queue depth of the workload queue is set to shallow level, and the waiting time for each workload queue is set to medium latency.

19. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 6, characterized in that, if the hit rate of the solid-state drive increases, the queue depth of the workload queue remains the same or becomes shallower, and the waiting time for each workload queue remains the same or becomes shorter; otherwise, the queue depth of the workload queue remains the same or becomes deeper, and the waiting time for each workload queue remains the same or becomes longer.

20. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 6, characterized in that, if the merging ratio increases, the queue depth of the workload queue remains the same or becomes shallower, and the waiting time for each workload queue remains the same or becomes shorter; otherwise, the queue depth of the workload queue remains the same or becomes deeper, and the waiting time for each workload queue remains the same or becomes longer.

21. The workload-aware input/output scheduler in a software-defined hybrid storage system according to claim 6, wherein the intermediate size is 8KB.