CN116846978A - Resource scheduling method, application identification method and related equipment of cloud computing system - Google Patents
Resource scheduling method, application identification method and related equipment of cloud computing system Download PDFInfo
- Publication number
- CN116846978A CN116846978A CN202210303023.4A CN202210303023A CN116846978A CN 116846978 A CN116846978 A CN 116846978A CN 202210303023 A CN202210303023 A CN 202210303023A CN 116846978 A CN116846978 A CN 116846978A
- Authority
- CN
- China
- Prior art keywords
- application
- type application
- type
- applications
- resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
本申请实施例公开了云计算系统的资源调度方法、应用识别方法以及相关设备,用于提升资源利用率,降低计算机系统中应用之间的干扰。资源调度方法应用于资源调度器,方法包括:获取计算机系统中包括的至少一个延迟敏感LC型应用中每个LC型应用的剩余干扰容忍度;从多个LC型应用中,获取剩余干扰容忍度最小的第一LC型应用。若第一LC型应用的剩余干扰容忍度小于容忍度下限,则增加第一LC型应用的第一隔离区资源。若第一LC型应用的剩余干扰容忍度大于容忍度上限,则将多个LC型应用中的第二LC型应用的第二隔离区资源转移至资源共享区。
Embodiments of this application disclose resource scheduling methods, application identification methods and related equipment of cloud computing systems, which are used to improve resource utilization and reduce interference between applications in the computer system. The resource scheduling method is applied to the resource scheduler, and the method includes: obtaining the remaining interference tolerance of each LC type application in at least one delay-sensitive LC type application included in the computer system; obtaining the remaining interference tolerance from multiple LC type applications Smallest first LC type application. If the remaining interference tolerance of the first LC type application is less than the lower limit of the tolerance, the first isolation area resources of the first LC type application are increased. If the remaining interference tolerance of the first LC type application is greater than the tolerance upper limit, the second isolation area resources of the second LC type application among the multiple LC type applications are transferred to the resource sharing area.
Description
技术领域Technical Field
本申请实施例涉及计算机领域,尤其涉及云计算系统的资源调度方法、应用识别方法以及相关设备。The embodiments of the present application relate to the computer field, and in particular to a resource scheduling method, an application identification method, and related devices of a cloud computing system.
背景技术Background Art
随着互联网的不断发展,作为信息基础设施的数据中心的规模也日益增长。然而,当前大多数数据中心中的资源利用率依旧很低。为了降低成本,可以同时运行多个应用共享底层的资源来提升资源利用率。虽然应用的共置能有效地提高资源利用率,但是部署在同一台物理机上的应用会争夺共享资源,导致应用间的干扰频繁发生。With the continuous development of the Internet, the scale of data centers as information infrastructure is also growing. However, the resource utilization rate in most data centers is still very low. In order to reduce costs, multiple applications can be run simultaneously to share the underlying resources to improve resource utilization. Although the co-location of applications can effectively improve resource utilization, applications deployed on the same physical machine will compete for shared resources, resulting in frequent interference between applications.
在一种资源调度方法中,每个延迟敏感型(latency critical,LC)型应用有固定的隔离区资源,基于强化学习的方式,对于不可隔离的共享资源进行处理,从而确定满足LC型应用的服务目标质量的资源分配方案。In a resource scheduling method, each latency critical (LC) application has fixed isolation zone resources, and non-isolated shared resources are processed based on reinforcement learning to determine a resource allocation scheme that meets the service target quality of the LC application.
在这种方法中,由于每个LC型应用有固定的隔离区资源,在应用所需资源较少的情况下,应用占有的固定隔离区资源可能会远大于实际需求,从而导致资源利用率低。In this method, since each LC type application has fixed isolation area resources, when the application requires fewer resources, the fixed isolation area resources occupied by the application may be much greater than the actual demand, resulting in low resource utilization.
发明内容Summary of the invention
本申请实施例提供了云计算系统的资源调度方法、应用识别方法以及相关设备,在资源调度方法中,根据计算机系统中各个LC型应用的剩余干扰容忍度的不同取值,确定不同的资源调度方式,从而实现增加LC型应用的隔离区资源,或者减少LC型应用的隔离区资源以增加计算机系统的共享区资源,实现了隔离区资源与共享区资源的灵活分配,提升了资源的利用率。同时,通过调度隔离区资源和共享区资源,降低了应用之间的干扰,也提升了系统性能。The embodiment of the present application provides a resource scheduling method, an application identification method and related devices of a cloud computing system. In the resource scheduling method, different resource scheduling modes are determined according to different values of the residual interference tolerance of each LC type application in the computer system, so as to increase the isolation area resources of the LC type application, or reduce the isolation area resources of the LC type application to increase the shared area resources of the computer system, thereby realizing the flexible allocation of isolation area resources and shared area resources and improving the utilization rate of resources. At the same time, by scheduling isolation area resources and shared area resources, the interference between applications is reduced and the system performance is also improved.
本申请实施例第一方面提供了一种云计算系统的资源调度方法,该方法应用于资源调度器,方法包括:A first aspect of an embodiment of the present application provides a resource scheduling method for a cloud computing system, the method being applied to a resource scheduler, the method comprising:
计算机系统中运行有多个LC型应用,资源调度器能够自行计算或者接收来自于计算设备的每个LC型应用的剩余干扰容忍度。剩余干扰容忍度能够反映每个LC型应用受干扰的程度,剩余干扰容忍度越小,表示该LC型应用受干扰的程度越大。之后,资源调度器从这多个LC型应用中选择剩余干扰容忍度最小的第一LC型应用。并比较第一LC型应用的剩余干扰容忍度与容忍度下限的关系,以此确定是否需要对第一LC型应用的隔离区资源进行调整。如果第一LC型应用的剩余干扰容忍度小于容忍度下限,意味着第一LC型应用受到了严重的干扰,资源紧张,则增加第一LC型应用的第一隔离区资源。如果若第一LC型应用的剩余干扰容忍度大于容忍度上限,又因为第一LC型应用是所有LC型应用中剩余干扰容忍度最小的,因此可以认为所有的LC型应用都没有受到干扰或者受干扰程度小,可以将多个LC型应用中的第二LC型应用的第二隔离区资源转移至资源共享区。其中,第二LC型应用的剩余干扰容忍度大于第一LC型应用的剩余干扰容忍度。可选的,第二LC型应用的剩余干扰容忍度可以是多个LC型应用中最大的。There are multiple LC type applications running in the computer system, and the resource scheduler can calculate or receive the remaining interference tolerance of each LC type application from the computing device. The remaining interference tolerance can reflect the degree of interference of each LC type application. The smaller the remaining interference tolerance, the greater the degree of interference of the LC type application. Afterwards, the resource scheduler selects the first LC type application with the smallest remaining interference tolerance from the multiple LC type applications. And compare the relationship between the remaining interference tolerance of the first LC type application and the tolerance lower limit to determine whether it is necessary to adjust the isolation area resources of the first LC type application. If the remaining interference tolerance of the first LC type application is less than the tolerance lower limit, it means that the first LC type application is seriously interfered with and the resources are tight, then the first isolation area resources of the first LC type application are increased. If the remaining interference tolerance of the first LC type application is greater than the tolerance upper limit, and because the first LC type application has the smallest remaining interference tolerance among all LC type applications, it can be considered that all LC type applications are not interfered with or the interference is small, and the second isolation area resources of the second LC type application in the multiple LC type applications can be transferred to the resource sharing area. The residual interference tolerance of the second LC type application is greater than the residual interference tolerance of the first LC type application. Optionally, the residual interference tolerance of the second LC type application may be the largest among the multiple LC type applications.
从以上技术方案可以看出,本申请实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:
根据计算机系统中各个LC型应用的剩余干扰容忍度的不同取值,确定不同的资源调度方式,从而实现增加LC型应用的隔离区资源,或者减少LC型应用的隔离区资源以增加计算机系统的共享区资源,实现了隔离区资源与共享区资源的灵活分配,提升了资源的利用率。同时,通过调度隔离区资源和共享区资源,降低了应用之间的干扰,也提升了系统性能。According to the different values of the residual interference tolerance of each LC type application in the computer system, different resource scheduling methods are determined to increase the isolated area resources of the LC type application, or reduce the isolated area resources of the LC type application to increase the shared area resources of the computer system, thereby realizing the flexible allocation of isolated area resources and shared area resources and improving resource utilization. At the same time, by scheduling isolated area resources and shared area resources, the interference between applications is reduced and the system performance is improved.
在第一方面的一些可选实施例中,资源调度器还可以获取计算机系统的当前系统熵,当前系统熵用于当前指示计算机系统中应用之间的干扰程度。当前系统熵越大,表示当前指示计算机系统中应用之间的干扰程度越大,计算机系统的性能越差。资源调度器通过比较当前系统熵与系统熵阈值之间的大小关系,以及第一LC型应用的剩余干扰容忍度与容忍度下限和容忍度上限之间的大小关系,确定进行资源调度的区域。具体来说,如果当前系统熵大于或等于系统熵阈值,且第一LC型应用的剩余干扰容忍度小于容忍度下限,那么可以认为第一LC型应用受到的干扰严重,资源紧张,需要增加第一LC型应用的第一隔离区资源。如果当前系统熵大于或等于系统熵阈值,且第一LC型应用的剩余干扰容忍度大于容忍度上限,那么可以认为当前系统的性能表现较差是由于LC型应用的隔离区资源较多导致的,因此资源调度器可以将第二LC型应用的第二隔离区资源转移至资源共享区。In some optional embodiments of the first aspect, the resource scheduler may also obtain the current system entropy of the computer system, and the current system entropy is used to indicate the interference degree between applications in the current computer system. The larger the current system entropy is, the greater the interference degree between applications in the current indicated computer system is, and the worse the performance of the computer system is. The resource scheduler determines the area for resource scheduling by comparing the size relationship between the current system entropy and the system entropy threshold, and the size relationship between the remaining interference tolerance of the first LC type application and the tolerance lower limit and the tolerance upper limit. Specifically, if the current system entropy is greater than or equal to the system entropy threshold, and the remaining interference tolerance of the first LC type application is less than the tolerance lower limit, then it can be considered that the first LC type application is seriously interfered with, resources are tight, and the first isolation area resources of the first LC type application need to be increased. If the current system entropy is greater than or equal to the system entropy threshold, and the remaining interference tolerance of the first LC type application is greater than the tolerance upper limit, then it can be considered that the poor performance of the current system is due to the large number of isolation area resources of the LC type application, so the resource scheduler can transfer the second isolation area resources of the second LC type application to the resource sharing area.
本申请实施例中,结合剩余容忍度和系统熵,来确定是否进行资源调度,对资源调度的条件进行了更加严格的限定,避免了在不必要的情况下进行资源调度,节约了计算资源。In the embodiment of the present application, the remaining tolerance and system entropy are combined to determine whether to perform resource scheduling, and the conditions for resource scheduling are more strictly limited, thereby avoiding resource scheduling under unnecessary circumstances and saving computing resources.
在第一方面的一些可选实施例中,资源调取器增加第一LC型应用的第一隔离区资源的方式有多种。可选的,如果计算机系统中存在剩余干扰容忍度大于容忍度上限,且具有可剥离的第三隔离区资源的第三LC型应用,那么资源调度器会将第三隔离区资源中的资源转移至第一LC型应用对应的第一隔离区,以增加第一隔离区资源。可选的,如果计算机系统中不存在这样的第三LC型应用,那么资源调度器会将资源共享区的资源转移至第一隔离区,以增加第一隔离区资源。其中,计算机系统中不存在这样的第三LC型应用包括,计算机系统中所有的LC型应用的剩余干扰容忍度都不大于容忍度上限,或者,计算机系统中所有的LC型应用都不具备可剥离的隔离区资源,或者,计算机系统中剩余干扰容忍度大于容忍度上限的LC型应用都不具备可剥离的隔离区资源,具体此处不做限定。In some optional embodiments of the first aspect, there are multiple ways for the resource caller to increase the first isolation area resources of the first LC type application. Optionally, if there is a third LC type application in the computer system whose residual interference tolerance is greater than the tolerance upper limit and has a strippable third isolation area resource, the resource scheduler will transfer the resources in the third isolation area resources to the first isolation area corresponding to the first LC type application to increase the first isolation area resources. Optionally, if there is no such third LC type application in the computer system, the resource scheduler will transfer the resources of the resource sharing area to the first isolation area to increase the first isolation area resources. Among them, the absence of such a third LC type application in the computer system includes that the residual interference tolerance of all LC type applications in the computer system is not greater than the tolerance upper limit, or, all LC type applications in the computer system do not have strippable isolation area resources, or, the LC type applications in the computer system whose residual interference tolerance is greater than the tolerance upper limit do not have strippable isolation area resources, and the specifics are not limited here.
本申请实施例中,在增加第一LC型应用的第一隔离区资源时,优先考虑从其他LC型应用的隔离区资源中转移资源,尽可能减低对系统性能的不良影响,提升了技术方案的实用性。同时,对于不同的情况,有不同的方式增加第一隔离区资源,能够适应不同的场景,提升了本申请技术方案的灵活性和可适应性。In the embodiment of the present application, when increasing the first isolation area resources of the first LC type application, priority is given to transferring resources from the isolation area resources of other LC type applications, so as to minimize the adverse effects on system performance and improve the practicality of the technical solution. At the same time, for different situations, there are different ways to increase the first isolation area resources, which can adapt to different scenarios and improve the flexibility and adaptability of the technical solution of the present application.
在第一方面的一些可选实施例中,在计算机系统中只包括LC型应用的情况下,资源调度器可以获取每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量。并根据每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量,确定当前系统熵。In some optional embodiments of the first aspect, when the computer system includes only LC type applications, the resource scheduler can obtain the amount of interference that each LC type application can tolerate and the amount of interference that each LC type application actually receives, and determine the current system entropy based on the amount of interference that each LC type application can tolerate and the amount of interference that each LC type application actually receives.
在第一方面的一些可选实施例中,在计算机系统中包括LC型应用和至少一个尽力而为(best effort,BE)型应用的情况下,资源调度器可以获取来自于应用区分器或来自于用户的第一应用标识和第二应用标识,以对LC型应用和BE型应用进行区分。其中,第一应用标识用于指示LC型应用,第二应用标识用于指示BE型应用。资源调度器能够根据第一应用标识,确定计算机系统中的LC型应用。根据第二应用标识,确定计算机系统中的BE型应用。资源调度器通过获取每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量,并根据每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量,确定多个LC型应用的熵。通过获取至少一个BE型应用中每个BE型应用单独运行时的第一每周期指令数和每个BE型应用受干扰后的第二每周期指令数,并根据第一每周期指令数和第二每周期指令数,确定至少一个BE型应用的熵。然后根据多个LC型应用的熵和至少一个BE型应用的熵,确定当前系统熵。由于LC型应用的服务质量的重要性远大于BE型应用的重要性,因此,多个LC型应用的熵的权重大于至少一个BE型应用的熵的权重。In some optional embodiments of the first aspect, when a computer system includes an LC type application and at least one best effort (BE) type application, the resource scheduler may obtain a first application identifier and a second application identifier from an application distinguisher or from a user to distinguish between the LC type application and the BE type application. The first application identifier is used to indicate the LC type application, and the second application identifier is used to indicate the BE type application. The resource scheduler can determine the LC type application in the computer system according to the first application identifier. According to the second application identifier, the BE type application in the computer system is determined. The resource scheduler determines the entropy of multiple LC type applications by obtaining the amount of interference that each LC type application can tolerate and the amount of interference that each LC type application actually receives, and according to the amount of interference that each LC type application can tolerate and the amount of interference that each LC type application actually receives. The entropy of at least one BE type application is determined by obtaining the first number of instructions per cycle when each BE type application runs alone and the second number of instructions per cycle after each BE type application is disturbed, and according to the first number of instructions per cycle and the second number of instructions per cycle. Then, the current system entropy is determined according to the entropies of the multiple LC type applications and the entropy of at least one BE type application. Since the importance of the service quality of the LC type application is much greater than that of the BE type application, the weight of the entropy of the plurality of LC type applications is greater than the weight of the entropy of at least one BE type application.
在第一方面的一些可选实施例中,资源调度器除了自行计算当前系统熵之外,还可以获取来自于计算设备的当前系统熵,具体此处不做限定。In some optional embodiments of the first aspect, in addition to calculating the current system entropy by itself, the resource scheduler may also obtain the current system entropy from the computing device, which is not specifically limited here.
本申请实施例中,对于计算机系统所包括的应用类型的不同,资源调度器可以通过不同的方式,确定当前系统熵,从而对计算机系统的性能进行评估,也能够灵活适用不同的场景,提升了本申请技术方案的灵活性和实用性。In an embodiment of the present application, for different types of applications included in the computer system, the resource scheduler can determine the current system entropy in different ways to evaluate the performance of the computer system, and can also flexibly apply to different scenarios, thereby improving the flexibility and practicality of the technical solution of the present application.
在第一方面的一些可选实施例中,还可以通过其他的方式触发资源调度器进行资源调度。资源调度器获取来自于服务质量预测器的第一调度信息或第二调度信息,其中,第一调度信息指示增加多个LC型应用中目标LC型应用的隔离区资源,第二调度信息指示减少目标LC型应用的隔离区资源。之后,资源调度器根据第一调度信息,增加目标LC型应用的隔离区资源;或者,根据第二调度信息,减少目标LC型应用的隔离区资源。In some optional embodiments of the first aspect, the resource scheduler may be triggered to perform resource scheduling in other ways. The resource scheduler obtains the first scheduling information or the second scheduling information from the service quality predictor, wherein the first scheduling information indicates to increase the isolation area resources of the target LC type application among the multiple LC type applications, and the second scheduling information indicates to reduce the isolation area resources of the target LC type application. Afterwards, the resource scheduler increases the isolation area resources of the target LC type application according to the first scheduling information; or reduces the isolation area resources of the target LC type application according to the second scheduling information.
本申请实施例中,可以有多种方式触发资源调度器进行资源调度,处理利用剩余容忍度触发之外,还可以基于剩余容忍度和当前系统熵触发,除此之外,还可以根据指示信息触发,能够适应实际应用中的不同需求,提升了技术方案的灵活性和可适应性。另外,资源调度器可以直接根据调度信息进行确定资源调度的区域,使得资源调度器即使无法获取剩余容忍度或者当前系统熵的情况下,依旧能够进行资源调度,保证了资源调度器的顺利工作,提升了技术方案的可靠性。In the embodiment of the present application, there are multiple ways to trigger the resource scheduler to perform resource scheduling. In addition to the remaining tolerance trigger, it can also be triggered based on the remaining tolerance and the current system entropy. In addition, it can also be triggered according to the indication information, which can adapt to different needs in actual applications and improve the flexibility and adaptability of the technical solution. In addition, the resource scheduler can directly determine the area for resource scheduling based on the scheduling information, so that even if the resource scheduler cannot obtain the remaining tolerance or the current system entropy, it can still perform resource scheduling, ensuring the smooth operation of the resource scheduler and improving the reliability of the technical solution.
在第一方面的一些可选实施例中,资源调度器能够自行计算或者获取来自于计算设备的计算机系统资源调度后的系统熵。并对若资源调度后的系统熵和当前系统熵之间的大小关系进行判断,如果资源调度后的系统熵小于当前系统熵,表示系统的性能提升了,可以确定资源调度成功。In some optional embodiments of the first aspect, the resource scheduler can calculate or obtain the system entropy after the computer system resources are scheduled from the computing device, and judge the size relationship between the system entropy after the resource scheduling and the current system entropy. If the system entropy after the resource scheduling is less than the current system entropy, it means that the performance of the system is improved, and it can be determined that the resource scheduling is successful.
本申请实施例中,通过验证,确保资源调度不会造成更坏的结果,也即不会加剧应用之间的干扰,提升了技术方案的可靠性。In the embodiment of the present application, verification is performed to ensure that resource scheduling will not cause worse results, that is, will not aggravate interference between applications, thereby improving the reliability of the technical solution.
本申请实施例第二方面提供了一种云计算系统的资源调度方法,该方法应用于服务质量预测器,该方法包括:服务质量预测器获取目标LC型应用对应的多个网络接收队列长度,并计算多个网络接收队列长度的均值,得到平均网络接收队列长度。在平均网络接收队列长度大于长度阈值的情况下,向资源调度器发送第一调度信息,第一调度信息指示增加目标LC型应用的隔离区资源。在平均网络接收队列长度小于或等于长度阈值,且多个网络接收队列长度中取值为0的网络接收队列长度在多个网络接收队列长度中的占比大于比例阈值的情况下,向资源调度器发送第二调度信息,第二调度信息指示减少目标LC型应用的隔离区资源。其中,长度阈值和比例阈值可以由服务质量预测器根据计算系统的历史运行状态确定,也可以由用户设定,具体此处不做限定。The second aspect of the embodiment of the present application provides a resource scheduling method for a cloud computing system, which is applied to a service quality predictor, and the method includes: the service quality predictor obtains multiple network receiving queue lengths corresponding to the target LC type application, and calculates the average of the multiple network receiving queue lengths to obtain the average network receiving queue length. When the average network receiving queue length is greater than the length threshold, a first scheduling information is sent to the resource scheduler, and the first scheduling information indicates to increase the isolation area resources of the target LC type application. When the average network receiving queue length is less than or equal to the length threshold, and the proportion of the network receiving queue length with a value of 0 in the multiple network receiving queue lengths is greater than the proportion threshold, a second scheduling information is sent to the resource scheduler, and the second scheduling information indicates to reduce the isolation area resources of the target LC type application. Among them, the length threshold and the proportion threshold can be determined by the service quality predictor according to the historical operating status of the computing system, and can also be set by the user, which is not specifically limited here.
本申请实施例中,服务质量预测器通过对各个LC型应用对应的缓冲区进行监测,不需要计算系统熵或者容忍度等,便可以确定是否需要进行资源调度,简化了流程,节约了计算资源。In the embodiment of the present application, the service quality predictor monitors the buffer corresponding to each LC type application, and can determine whether resource scheduling is required without calculating system entropy or tolerance, etc., thereby simplifying the process and saving computing resources.
本申请实施例第三方面提供了一种应用识别方法,该方法应用于应用区分器,该方法包括:应用区分器获取计算机系统中多个应用在当前阶段的多个网络总带宽平均值,并根据多个网络总带宽平均值,确定多个应用在当前阶段的多个网络总带宽区间变异系数。在多个网络总带宽区间变异系数大于区间系数阈值的情况下,获取多个应用在当前阶段的多个末尾发送/接收带宽比例与下一阶段的多个起始发送/接收带宽比例的多个差值绝对值。应用区分器还会获取多个应用在当前阶段的前后目标时间段内的多个发送/接收带宽比例变异系数。之后,根据多个差值绝对值或多个发送/接收带宽比例变异系数,确定多个应用中LC型应用的标识为第一应用标识,多个应用中BE型应用的标识为第二应用标识。向资源调度器发送第一应用标识和第二应用标识,以使资源调度器区分BE型应用和LC型应用。The third aspect of the embodiment of the present application provides an application identification method, which is applied to an application distinguisher, and the method includes: the application distinguisher obtains the average values of multiple network total bandwidths of multiple applications in the current stage in the computer system, and determines the interval variation coefficients of multiple network total bandwidths of multiple applications in the current stage based on the average values of the multiple network total bandwidths. In the case where the interval variation coefficients of the multiple network total bandwidths are greater than the interval coefficient threshold, the multiple absolute values of the difference between the multiple end send/receive bandwidth ratios of the multiple applications in the current stage and the multiple start send/receive bandwidth ratios of the next stage are obtained. The application distinguisher also obtains the multiple send/receive bandwidth ratio variation coefficients of the multiple applications in the target time periods before and after the current stage. Afterwards, according to the multiple absolute values of the difference or the multiple send/receive bandwidth ratio variation coefficients, the identification of the LC type application in the multiple applications is determined as the first application identification, and the identification of the BE type application in the multiple applications is determined as the second application identification. The first application identification and the second application identification are sent to the resource scheduler so that the resource scheduler distinguishes between the BE type application and the LC type application.
本申请实施例中,应用区分器能够对计算机系统中多个应用的应用类型进行识别,并告知资源调度器各个应用的应用标识,为资源调度器进行资源调度提供了技术支持,提升了技术方案的可实现性。In an embodiment of the present application, the application distinguisher can identify the application types of multiple applications in a computer system and inform the resource scheduler of the application identifier of each application, providing technical support for the resource scheduler to perform resource scheduling and improving the feasibility of the technical solution.
在第三方面的一些可选实施例中,根据多个差值绝对值或多个发送/接收带宽比例变异系数,确定多个应用中LC型应用的标识为第一应用标识,多个应用中BE型应用的标识为第二应用标识,包括:从多个应用中确定差值绝对值大于差值阈值,和/或,发送/接收带宽比例变异系数大于系数阈值的应用,为BE型应用。标记BE型应用的标识为第二应用标识,并标记多个应用中BE型应用之外的应用的标识为第一应用标识。In some optional embodiments of the third aspect, according to multiple absolute difference values or multiple transmission/reception bandwidth ratio variation coefficients, determining the identifier of the LC type application in multiple applications as the first application identifier and the identifier of the BE type application in multiple applications as the second application identifier includes: determining from multiple applications that the absolute difference value is greater than the difference threshold, and/or the transmission/reception bandwidth ratio variation coefficient is greater than the coefficient threshold, as the BE type application. Marking the identifier of the BE type application as the second application identifier, and marking the identifiers of the applications other than the BE type application in the multiple applications as the first application identifier.
本申请实施例中,确定应用类型的依据有多种情况,有的情况对应用区分器的区分功能要求并不是很高,允许一定的容错率。而有的情况相对较为严格,有利于明确区分LC型应用和BE型应用,提升了系统的可靠性。In the embodiment of the present application, there are many cases for determining the application type. In some cases, the requirements for the distinguishing function of the application distinguisher are not very high, and a certain fault tolerance is allowed. In other cases, it is relatively strict, which is conducive to clearly distinguishing LC type applications from BE type applications and improving the reliability of the system.
本申请实施例第四方面提供了一种云计算系统的资源调度方法,该方法应用于资源调度器,该方法包括:获取来自于服务质量预测器的第一调度信息或第二调度信息,第一调度信息指示增加多个LC型应用中目标LC型应用的隔离区资源,第二调度信息指示减少目标LC型应用的隔离区资源;根据第一调度信息,增加目标LC型应用的隔离区资源;或者,根据第二调度信息,减少目标LC型应用的隔离区资源。A fourth aspect of an embodiment of the present application provides a resource scheduling method for a cloud computing system, which is applied to a resource scheduler. The method includes: obtaining first scheduling information or second scheduling information from a service quality predictor, the first scheduling information indicating to increase the isolation area resources of a target LC type application among multiple LC type applications, and the second scheduling information indicating to reduce the isolation area resources of the target LC type application; increasing the isolation area resources of the target LC type application according to the first scheduling information; or reducing the isolation area resources of the target LC type application according to the second scheduling information.
本申请实施例中,资源调度器可以直接根据调度信息进行确定资源调度的区域,使得资源调度器即使无法获取剩余容忍度或者当前系统熵的情况下,依旧能够进行资源调度,保证了资源调度器的顺利工作,提升了技术方案的可靠性。In an embodiment of the present application, the resource scheduler can directly determine the resource scheduling area based on the scheduling information, so that even if the resource scheduler cannot obtain the remaining tolerance or the current system entropy, it can still perform resource scheduling, thereby ensuring the smooth operation of the resource scheduler and improving the reliability of the technical solution.
本申请实施例第五方面提供了一种资源调度系统,该资源调度系统包括资源调度器,资源调度器用于:获取计算机系统中包括的至少一个延迟敏感LC型应用中每个LC型应用的剩余干扰容忍度。从多个LC型应用中,获取剩余干扰容忍度最小的第一LC型应用。若第一LC型应用的剩余干扰容忍度小于容忍度下限,则增加第一LC型应用的第一隔离区资源。若第一LC型应用的剩余干扰容忍度大于容忍度上限,则将多个LC型应用中的第二LC型应用的第二隔离区资源转移至资源共享区。In a fifth aspect, an embodiment of the present application provides a resource scheduling system, the resource scheduling system comprising a resource scheduler, the resource scheduler being used to: obtain the residual interference tolerance of each LC type application in at least one delay-sensitive LC type application included in the computer system. From multiple LC type applications, obtain a first LC type application with the smallest residual interference tolerance. If the residual interference tolerance of the first LC type application is less than the lower limit of the tolerance, increase the first isolation area resources of the first LC type application. If the residual interference tolerance of the first LC type application is greater than the upper limit of the tolerance, transfer the second isolation area resources of the second LC type application in the multiple LC type applications to the resource sharing area.
在第五方面的一些可选实施例中,资源调度器还用于:获取计算机系统的当前系统熵,当前系统熵用于当前指示计算机系统中应用之间的干扰程度。若当前系统熵大于或等于系统熵阈值,且第一LC型应用的剩余干扰容忍度小于容忍度下限,则增加第一隔离区资源。若当前系统熵大于或等于系统熵阈值,且第一LC型应用的剩余干扰容忍度大于容忍度上限,则将第二LC型应用的第二隔离区资源转移至资源共享区。In some optional embodiments of the fifth aspect, the resource scheduler is further used to: obtain the current system entropy of the computer system, the current system entropy is used to currently indicate the degree of interference between applications in the computer system. If the current system entropy is greater than or equal to the system entropy threshold, and the remaining interference tolerance of the first LC type application is less than the lower limit of the tolerance, then increase the first isolation area resources. If the current system entropy is greater than or equal to the system entropy threshold, and the remaining interference tolerance of the first LC type application is greater than the upper limit of the tolerance, then transfer the second isolation area resources of the second LC type application to the resource sharing area.
在第五方面的一些可选实施例中,资源调度器具体用于:若计算机系统中存在剩余干扰容忍度大于容忍度上限,且具有可剥离的第三隔离区资源的第三LC型应用,则将第三隔离区资源中的资源转移至第一LC型应用对应的第一隔离区,以增加第一隔离区资源。若计算机系统中不存在第三LC型应用,则将资源共享区的资源转移至第一隔离区,以增加第一隔离区资源。In some optional embodiments of the fifth aspect, the resource scheduler is specifically configured to: if there is a third LC type application in the computer system whose remaining interference tolerance is greater than the tolerance upper limit and has a third isolated area resource that can be stripped, transfer resources in the third isolated area resources to the first isolated area corresponding to the first LC type application to increase the resources of the first isolated area. If there is no third LC type application in the computer system, transfer resources in the resource sharing area to the first isolated area to increase the resources of the first isolated area.
在第五方面的一些可选实施例中,资源调度器具体用于:获取每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量。根据每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量,确定当前系统熵。In some optional embodiments of the fifth aspect, the resource scheduler is specifically used to: obtain the amount of interference that each LC type application can tolerate and the amount of interference actually received by each LC type application. According to the amount of interference that each LC type application can tolerate and the amount of interference actually received by each LC type application, determine the current system entropy.
在第五方面的一些可选实施例中,资源调度器还用于:获取来自于应用区分器或来自于用户的第一应用标识和第二应用标识,第一应用标识用于指示LC型应用,第二应用标识用于指示BE型应用。根据第一应用标识,确定计算机系统中的LC型应用。根据第二应用标识,确定计算机系统中的BE型应用。In some optional embodiments of the fifth aspect, the resource scheduler is further used to: obtain a first application identifier and a second application identifier from an application distinguisher or from a user, the first application identifier being used to indicate an LC type application, and the second application identifier being used to indicate a BE type application. Determine the LC type application in the computer system based on the first application identifier. Determine the BE type application in the computer system based on the second application identifier.
资源调度器具体用于:获取每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量。根据每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量,确定多个LC型应用的熵。获取至少一个BE型应用中每个BE型应用单独运行时的第一每周期指令数和每个BE型应用受干扰后的第二每周期指令数。根据第一每周期指令数和第二每周期指令数,确定至少一个BE型应用的熵。根据多个LC型应用的熵和至少一个BE型应用的熵,确定当前系统熵。The resource scheduler is specifically used to: obtain the amount of interference that each LC-type application can tolerate and the amount of interference that each LC-type application actually receives. Determine the entropy of multiple LC-type applications based on the amount of interference that each LC-type application can tolerate and the amount of interference that each LC-type application actually receives. Obtain a first number of instructions per cycle when each BE-type application of at least one BE-type application runs alone and a second number of instructions per cycle after each BE-type application is interfered with. Determine the entropy of at least one BE-type application based on the first number of instructions per cycle and the second number of instructions per cycle. Determine the current system entropy based on the entropies of multiple LC-type applications and the entropy of at least one BE-type application.
在第五方面的一些可选实施例中,资源调度器还用于:获取来自于服务质量预测器的第一调度信息或第二调度信息,第一调度信息指示增加多个LC型应用中目标LC型应用的隔离区资源,第二调度信息指示减少目标LC型应用的隔离区资源。根据第一调度信息,增加目标LC型应用的隔离区资源。或者,根据第二调度信息,减少目标LC型应用的隔离区资源。In some optional embodiments of the fifth aspect, the resource scheduler is further used to: obtain first scheduling information or second scheduling information from the service quality predictor, the first scheduling information indicating to increase the isolation area resources of the target LC type application among the multiple LC type applications, and the second scheduling information indicating to reduce the isolation area resources of the target LC type application. According to the first scheduling information, the isolation area resources of the target LC type application are increased. Alternatively, according to the second scheduling information, the isolation area resources of the target LC type application are reduced.
在第五方面的一些可选实施例中,资源调度器还用于:获取计算机系统资源调度后的系统熵。若资源调度后的系统熵小于当前系统熵,则确定资源调度成功。In some optional embodiments of the fifth aspect, the resource scheduler is further used to: obtain system entropy after resource scheduling of the computer system. If the system entropy after resource scheduling is less than the current system entropy, it is determined that the resource scheduling is successful.
在第五方面的一些可选实施例中,资源调度系统还包括服务质量预测器,服务质量预测器用于:获取目标LC型应用对应的多个网络接收队列长度。计算多个网络接收队列长度的均值,得到平均网络接收队列长度。若平均网络接收队列长度大于长度阈值,则向资源调度器发送第一调度信息,第一调度信息指示增加目标LC型应用的隔离区资源。若平均网络接收队列长度小于或等于长度阈值,且多个网络接收队列长度中取值为0的网络接收队列长度在多个网络接收队列长度中的占比大于比例阈值,则向资源调度器发送第二调度信息,第二调度信息指示减少目标LC型应用的隔离区资源。In some optional embodiments of the fifth aspect, the resource scheduling system further includes a service quality predictor, which is used to: obtain multiple network receive queue lengths corresponding to the target LC type application. Calculate the average of the multiple network receive queue lengths to obtain the average network receive queue length. If the average network receive queue length is greater than the length threshold, send first scheduling information to the resource scheduler, and the first scheduling information indicates to increase the isolation area resources of the target LC type application. If the average network receive queue length is less than or equal to the length threshold, and the proportion of the network receive queue length with a value of 0 in the multiple network receive queue lengths is greater than the proportion threshold, send second scheduling information to the resource scheduler, and the second scheduling information indicates to reduce the isolation area resources of the target LC type application.
在第五方面的一些可选实施例中,资源调度系统还包括应用区分器,应用区分器用于:获取计算机系统中多个应用在当前阶段的多个网络总带宽平均值。根据多个网络总带宽平均值,确定多个应用在当前阶段的多个网络总带宽区间变异系数。若多个网络总带宽区间变异系数大于区间系数阈值,则获取多个应用在当前阶段的多个末尾发送/接收带宽比例与下一阶段的多个起始发送/接收带宽比例的多个差值绝对值。获取多个应用在当前阶段的前后目标时间段内的多个发送/接收带宽比例变异系数。根据多个差值绝对值或多个发送/接收带宽比例变异系数,确定多个应用中LC型应用的标识为第一应用标识,多个应用中BE型应用的标识为第二应用标识。向资源调度器发送第一应用标识和第二应用标识,以使资源调度器区分BE型应用和LC型应用。In some optional embodiments of the fifth aspect, the resource scheduling system further includes an application distinguisher, which is used to: obtain multiple average network total bandwidths of multiple applications in the computer system at the current stage. According to the multiple average network total bandwidths, determine the interval variation coefficients of the multiple network total bandwidths of the multiple applications at the current stage. If the multiple interval variation coefficients of the multiple network total bandwidths are greater than the interval coefficient threshold, obtain multiple absolute values of the difference between the multiple end send/receive bandwidth ratios of the multiple applications in the current stage and the multiple start send/receive bandwidth ratios of the next stage. Obtain multiple send/receive bandwidth ratio variation coefficients of the multiple applications in the target time periods before and after the current stage. According to the multiple absolute values of the difference or the multiple send/receive bandwidth ratio variation coefficients, determine that the identifier of the LC type application in the multiple applications is the first application identifier, and the identifier of the BE type application in the multiple applications is the second application identifier. Send the first application identifier and the second application identifier to the resource scheduler so that the resource scheduler distinguishes between the BE type application and the LC type application.
在第五方面的一些可选实施例中,应用区分器具体用于:从多个应用中,确定差值绝对值大于差值阈值,和/或,发送/接收带宽比例变异系数大于系数阈值的应用,为BE型应用。标记BE型应用的标识为第二应用标识。标记多个应用中BE型应用之外的应用的标识为第一应用标识。In some optional embodiments of the fifth aspect, the application distinguisher is specifically used to: determine, from multiple applications, an application whose absolute value of difference is greater than a difference threshold, and/or whose coefficient of variation of the transmission/reception bandwidth ratio is greater than a coefficient threshold, as a BE type application. Mark the identifier of the BE type application as the second application identifier. Mark the identifier of an application other than the BE type application among the multiple applications as the first application identifier.
在第五方面的一些可选实施例中,资源调度系统还包括计算设备,计算设备用于向资源调度器发送当前系统熵、或每个LC型应用能容忍的干扰量、或每个LC型应用实际受到的干扰量、或每个BE型应用单独运行时的第一每周期指令数、或每个BE型应用受干扰后的第二每周期指令数中的至少一项。In some optional embodiments of the fifth aspect, the resource scheduling system also includes a computing device, which is used to send to the resource scheduler at least one of the current system entropy, or the amount of interference that each LC-type application can tolerate, or the amount of interference actually received by each LC-type application, or the first number of instructions per cycle when each BE-type application runs alone, or the second number of instructions per cycle after each BE-type application is interfered.
在第五方面的一些可选实施例中,计算设备还用于获取目标LC型应用对应的多个网络接收队列长度。计算多个网络接收队列长度的均值,得到平均网络接收队列长度。向服务质量预测器发送平均网络接收队列长度。In some optional embodiments of the fifth aspect, the computing device is further configured to obtain a plurality of network receive queue lengths corresponding to the target LC type application, calculate an average of the plurality of network receive queue lengths to obtain an average network receive queue length, and send the average network receive queue length to the service quality predictor.
在第五方面的一些可选实施例中,计算设备还用于向应用区分器发送计算机系统中多个应用当前阶段的多个网络总带宽的平均值,或者多个应用在当前阶段的多个网络总带宽区间变异系数,或者多个应用在当前阶段的多个末尾发送/接收带宽比例与下一阶段的多个起始发送/接收带宽比例的多个差值绝对值,或者多个应用在当前阶段的前后目标时间段内的多个发送/接收带宽比例变异系数中的至少一项。In some optional embodiments of the fifth aspect, the computing device is also used to send to the application distinguisher an average value of multiple network total bandwidths of multiple applications in the computer system at the current stage, or multiple network total bandwidth interval variation coefficients of multiple applications in the current stage, or multiple absolute values of differences between multiple end send/receive bandwidth ratios of multiple applications in the current stage and multiple starting send/receive bandwidth ratios of the next stage, or at least one of multiple send/receive bandwidth ratio variation coefficients of multiple applications in the target time periods before and after the current stage.
资源调度系统用于实现前述第一方面至第四方面中任一方面所示的方法,第五方面的有益效果与第一方面至第四方面类似,此处不再赘述。The resource scheduling system is used to implement the method shown in any one of the first to fourth aspects mentioned above. The beneficial effects of the fifth aspect are similar to those of the first to fourth aspects and will not be repeated here.
本申请实施例第六方面提供了一种资源调度器,该资源调度器应用于云计算系统,资源调度器包括:A sixth aspect of an embodiment of the present application provides a resource scheduler, which is applied to a cloud computing system. The resource scheduler includes:
获取单元,用于获取计算机系统中包括的至少一个延迟敏感LC型应用中每个LC型应用的剩余干扰容忍度。从多个LC型应用中,获取剩余干扰容忍度最小的第一LC型应用。The acquisition unit is configured to acquire the residual interference tolerance of each LC type application in at least one delay-sensitive LC type application included in the computer system, and to acquire a first LC type application with the smallest residual interference tolerance from the plurality of LC type applications.
处理单元,用于:A processing unit for:
若第一LC型应用的剩余干扰容忍度小于容忍度下限,则增加第一LC型应用的第一隔离区资源。If the remaining interference tolerance of the first LC type application is less than the tolerance lower limit, the first isolation area resources of the first LC type application are increased.
若第一LC型应用的剩余干扰容忍度大于容忍度上限,则将多个LC型应用中的第二LC型应用的第二隔离区资源转移至资源共享区。If the remaining interference tolerance of the first LC-type application is greater than the tolerance upper limit, the second isolation area resources of the second LC-type application among the multiple LC-type applications are transferred to the resource sharing area.
资源调度器用于实现第一方面的方法,第六方面的有益效果与第一方面类似,此处不再赘述。The resource scheduler is used to implement the method of the first aspect. The beneficial effects of the sixth aspect are similar to those of the first aspect and will not be repeated here.
本申请实施例第七方面提供了一种服务质量预测器,该服务质量预测器应用于云计算系统,服务质量预测器包括:A seventh aspect of an embodiment of the present application provides a service quality predictor, which is applied to a cloud computing system. The service quality predictor includes:
获取单元,用于获取目标LC型应用对应的多个网络接收队列长度。The acquisition unit is used to acquire the lengths of multiple network receiving queues corresponding to the target LC type application.
处理单元,用于计算多个网络接收队列长度的均值,得到平均网络接收队列长度。The processing unit is used to calculate the average of the lengths of multiple network receiving queues to obtain the average network receiving queue length.
发送单元,用于:A sending unit, for:
若平均网络接收队列长度大于长度阈值,则向资源调度器发送第一调度信息,第一调度信息指示增加目标LC型应用的隔离区资源。If the average network receiving queue length is greater than the length threshold, first scheduling information is sent to the resource scheduler, where the first scheduling information indicates to increase the isolation area resources of the target LC type application.
若平均网络接收队列长度小于或等于长度阈值,且多个网络接收队列长度中取值为0的网络接收队列长度在多个网络接收队列长度中的占比大于比例阈值,则向资源调度器发送第二调度信息,第二调度信息指示减少目标LC型应用的隔离区资源。If the average network receive queue length is less than or equal to the length threshold, and the proportion of network receive queue lengths with a value of 0 among multiple network receive queue lengths is greater than the proportion threshold, then a second scheduling information is sent to the resource scheduler, and the second scheduling information indicates to reduce the isolation area resources of the target LC type application.
服务质量预测器用于实现第二方面的方法,第七方面的有益效果与第二方面类似,此处不再赘述。The service quality predictor is used to implement the method of the second aspect. The beneficial effects of the seventh aspect are similar to those of the second aspect and will not be repeated here.
本申请实施例第八方面提供了一种应用区分器,包括:An eighth aspect of the embodiments of the present application provides an application distinguisher, including:
获取单元,用于获取计算机系统中多个应用在当前阶段的多个网络总带宽平均值。The acquisition unit is used to acquire average values of multiple network total bandwidths of multiple applications in the computer system at the current stage.
处理单元,用于根据多个网络总带宽平均值,确定多个应用在当前阶段的多个网络总带宽区间变异系数。The processing unit is used to determine multiple network total bandwidth interval variation coefficients of multiple applications at a current stage according to multiple network total bandwidth average values.
获取单元,还用于若多个网络总带宽区间变异系数大于区间系数阈值,则获取多个应用在当前阶段的多个末尾发送/接收带宽比例与下一阶段的多个起始发送/接收带宽比例的多个差值绝对值。The acquisition unit is further used to acquire the absolute values of the differences between the multiple end sending/receiving bandwidth ratios of the multiple applications in the current stage and the multiple starting sending/receiving bandwidth ratios in the next stage if the multiple network total bandwidth interval variation coefficients are greater than the interval coefficient threshold.
获取单元,还用于获取多个应用在当前阶段的前后目标时间段内的多个发送/接收带宽比例变异系数。The acquisition unit is further used to acquire multiple sending/receiving bandwidth ratio variation coefficients of multiple applications in the target time periods before and after the current stage.
处理单元,还用于根据多个差值绝对值或多个发送/接收带宽比例变异系数,确定多个应用中LC型应用的标识为第一应用标识,多个应用中BE型应用的标识为第二应用标识。The processing unit is further used to determine, based on multiple difference absolute values or multiple transmission/reception bandwidth ratio variation coefficients, an identifier of an LC type application among multiple applications as a first application identifier and an identifier of a BE type application among multiple applications as a second application identifier.
发送单元,用于向资源调度器发送第一应用标识和第二应用标识,以使资源调度器区分BE型应用和LC型应用。The sending unit is used to send the first application identifier and the second application identifier to the resource scheduler, so that the resource scheduler can distinguish between the BE type application and the LC type application.
应用区分器用于实现第三方面的方法,第八方面的有益效果与第三方面类似,此处不再赘述。The application distinguisher is used to implement the method of the third aspect. The beneficial effects of the eighth aspect are similar to those of the third aspect and will not be repeated here.
本申请实施例第九方面提供了一种资源调度器,该资源调度器应用于云计算系统,资源调度器包括:A ninth aspect of an embodiment of the present application provides a resource scheduler, which is applied to a cloud computing system. The resource scheduler includes:
获取单元,用于获取来自于服务质量预测器的第一调度信息或第二调度信息,第一调度信息指示增加多个LC型应用中目标LC型应用的隔离区资源,第二调度信息指示减少目标LC型应用的隔离区资源。The acquisition unit is used to acquire first scheduling information or second scheduling information from the service quality predictor, the first scheduling information indicates to increase the isolation area resources of the target LC type application among multiple LC type applications, and the second scheduling information indicates to reduce the isolation area resources of the target LC type application.
处理单元,还用于根据第一调度信息,增加目标LC型应用的隔离区资源;或者,根据第二调度信息,减少目标LC型应用的隔离区资源。The processing unit is further configured to increase the isolation area resources of the target LC type application according to the first scheduling information; or reduce the isolation area resources of the target LC type application according to the second scheduling information.
资源调度器用于实现第四方面的方法,第九方面的有益效果与第四方面类似,此处不再赘述。The resource scheduler is used to implement the method of the fourth aspect. The beneficial effects of the ninth aspect are similar to those of the fourth aspect and will not be repeated here.
本申请实施例第十方面提供了一种计算机设备,包括处理器,存储器和通信接口,处理器,存储器和通信接口相连,处理器用于执行前述第一方面至第四方面中任一方面的方法。本方面所示的有益效果与第一方面至第四方面类似,此处不再赘述。The tenth aspect of the embodiment of the present application provides a computer device, including a processor, a memory and a communication interface, the processor, the memory and the communication interface are connected, and the processor is used to execute the method of any one of the first to fourth aspects. The beneficial effects shown in this aspect are similar to those of the first to fourth aspects, and will not be repeated here.
本申请实施例第十一方面提供了一种计算机可读存储介质,计算机可读存储介质中保存有程序,当计算机执行该程序时,执行第一方面至第四方面中任一方面的方法。An eleventh aspect of the embodiments of the present application provides a computer-readable storage medium, in which a program is stored. When a computer executes the program, the method of any one of the first to fourth aspects is executed.
本申请实施例第十二方面提供了一种计算机程序产品,其特征在于,当计算机程序产品在计算机上执行时,该计算机执行第一方面至第四方面中任一方面的方法。A twelfth aspect of the embodiments of the present application provides a computer program product, characterized in that when the computer program product is executed on a computer, the computer executes the method of any one of the first to fourth aspects.
第十一方面和第十二方面的有益效果与第一方面至第四方面类似,此处不再赘述。The beneficial effects of the eleventh and twelfth aspects are similar to those of the first to fourth aspects and will not be repeated here.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1a为隔离资源区和共享资源区的一个示意图;FIG. 1a is a schematic diagram of an isolated resource area and a shared resource area;
图1b为隔离资源区和共享资源区的另一个示意图;FIG1 b is another schematic diagram of an isolated resource area and a shared resource area;
图2为高并发云计算系统的一个系统架构示意图;FIG2 is a schematic diagram of a system architecture of a high-concurrency cloud computing system;
图3为本申请实施例提供的资源调度方法的一个流程示意图;FIG3 is a schematic diagram of a flow chart of a resource scheduling method provided in an embodiment of the present application;
图4为本申请实施例提供的资源调度方法的另一个流程示意图;FIG4 is another schematic diagram of a flow chart of a resource scheduling method provided in an embodiment of the present application;
图5为本申请实施例提供的资源调度方法的另一个流程示意图;FIG5 is another schematic diagram of a flow chart of a resource scheduling method provided in an embodiment of the present application;
图6为本申请实施例提供的资源调度方法的另一个流程示意图;FIG6 is another schematic diagram of a flow chart of a resource scheduling method provided in an embodiment of the present application;
图7为本申请实施例提供的资源调度方法的另一个流程示意图;FIG7 is another schematic diagram of a flow chart of a resource scheduling method provided in an embodiment of the present application;
图8为本申请实施例提供的一个代码示意图;FIG8 is a schematic diagram of a code provided in an embodiment of the present application;
图9为本申请实施例提供的资源调度方法的另一个流程示意图;FIG9 is another schematic diagram of a flow chart of a resource scheduling method provided in an embodiment of the present application;
图10a为Redis应用的发送带宽比例示意图;FIG10a is a schematic diagram of the transmission bandwidth ratio of the Redis application;
图10b为基于Spark框架的Terasort应用的发送带宽比例示意图;FIG10 b is a schematic diagram of the transmission bandwidth ratio of the Terasort application based on the Spark framework;
图11为本申请实施例提供的应用识别方法的一个流程示意图;FIG11 is a flow chart of an application identification method provided in an embodiment of the present application;
图12为本申请实施例提供的应用识别方法的一个流程示意图;FIG12 is a flow chart of an application identification method provided in an embodiment of the present application;
图13a为本申请实施例提供的资源调度系统的一个结构示意图;FIG13a is a schematic diagram of a structure of a resource scheduling system provided in an embodiment of the present application;
图13b为本申请实施例提供的资源调度系统的另一个结构示意图;FIG13b is another schematic diagram of the structure of the resource scheduling system provided in an embodiment of the present application;
图14为本申请实施例提供的资源调度器的一个结构示意图;FIG14 is a schematic diagram of a structure of a resource scheduler provided in an embodiment of the present application;
图15为本申请实施例提供的服务质量预测器的一个结构示意图;FIG15 is a schematic diagram of a structure of a service quality predictor provided in an embodiment of the present application;
图16为本申请实施例提供的应用区分器的一个结构示意图;FIG16 is a schematic diagram of a structure of an application distinguisher provided in an embodiment of the present application;
图17为本申请实施例提供的计算机设备的一个结构示意图。FIG. 17 is a schematic diagram of the structure of a computer device provided in an embodiment of the present application.
具体实施方式DETAILED DESCRIPTION
本申请实施例提供了云计算系统的资源调度方法、应用识别方法以及相关设备,在资源调度方法中,根据计算机系统中各个LC型应用的剩余干扰容忍度的不同取值,确定不同的资源调度方式,从而实现增加LC型应用的隔离区资源,或者减少LC型应用的隔离区资源以增加计算机系统的共享区资源,实现了隔离区资源与共享区资源的灵活分配,以适应LC型应用的实际需求,避免资源浪费,也提升了系统性能。The embodiments of the present application provide a resource scheduling method, an application identification method and related equipment for a cloud computing system. In the resource scheduling method, different resource scheduling modes are determined according to different values of the residual interference tolerance of each LC-type application in the computer system, so as to increase the isolation area resources of the LC-type application, or reduce the isolation area resources of the LC-type application to increase the shared area resources of the computer system, thereby realizing flexible allocation of isolation area resources and shared area resources to adapt to the actual needs of the LC-type application, avoid resource waste, and improve system performance.
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application are described below in conjunction with the accompanying drawings. It is known to those skilled in the art that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,其目的在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。另外,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。The terms "first", "second", etc. in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchanged in appropriate circumstances, which is only to describe the distinction mode adopted by the objects of the same attribute in the embodiments of the present application when describing. In addition, the terms "including" and "having" and any of their variations are intended to cover non-exclusive inclusions, so that the process, method, system, product or equipment containing a series of units need not be limited to those units, but may include other units that are not clearly listed or inherent to these processes, methods, products or equipment. In addition, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or", describes the association relationship of associated objects, indicating that three relationships can exist, for example, A and/or B, can represent: A exists alone, A and B exist simultaneously, and B exists alone, wherein A, B can be singular or plural. The character "/" generally represents that the associated objects before and after are a kind of "or" relationship. "At least one of the following" or similar expressions refers to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, and c can be single or plural.
首先,对本申请实施例中可能涉及的专有名词和相关概念进行说明。First, the proper nouns and related concepts that may be involved in the embodiments of the present application are explained.
1.隔离区资源和共享区资源。1. Isolation area resources and shared area resources.
各个应用中可以进行调度的资源分为隔离区和共享区两个区域,位于隔离区中的资源称为隔离区资源,位于共享区的资源称为共享区资源。隔离区也可以称为隔离资源区,是指该隔离区的所属应用可是使用的资源区域,也就是说,对于一个隔离区中的资源只有该隔离区的所属应用可以使用,其他应用不可以使用。而共享区也可以称为共享资源区,是所有应用都能共同使用的资源区域,也就是说,共享区资源是所有应用都可以使用的资源。在一个系统中,只存在一个共享区。The resources that can be scheduled in each application are divided into two areas: the isolation area and the shared area. The resources in the isolation area are called isolation area resources, and the resources in the shared area are called shared area resources. The isolation area can also be called the isolation resource area, which refers to the resource area that can be used by the application to which the isolation area belongs. In other words, only the application to which the isolation area belongs can use the resources in an isolation area, and other applications cannot use them. The shared area can also be called the shared resource area, which is a resource area that can be used by all applications. In other words, the shared area resources are resources that can be used by all applications. In a system, there is only one shared area.
为了说明的清楚,请参阅图1a和图1b,图1a和图1b为隔离资源区和共享资源区的示意图。For a clearer explanation, please refer to FIG. 1a and FIG. 1b , which are schematic diagrams of an isolated resource area and a shared resource area.
如图1a所示,在资源调度之前,系统中只有共享资源区,并没有隔离资源区。也就是说,所有应用如果应用1和应用2的服务质量下降,也即应用1和应用2受到了来自其他应用的干扰,需要进行资源调度,将共享资源区中的部分资源分给应用1和应用2。那么在资源调度之后,应用1和应用2就具有了专属的隔离区资源。从而降低了应用1和应用2受到的干扰。As shown in Figure 1a, before resource scheduling, there is only a shared resource area in the system, but no isolated resource area. That is to say, if the service quality of application 1 and application 2 is reduced, that is, application 1 and application 2 are interfered by other applications, resource scheduling is required to allocate part of the resources in the shared resource area to application 1 and application 2. Then, after resource scheduling, application 1 and application 2 have exclusive isolated area resources. This reduces the interference to application 1 and application 2.
如图1b所示,在一些可选的实施例中,每个应用都可以有专属的隔离资源区,同时还会使用共享资源区中的资源。As shown in FIG. 1 b , in some optional embodiments, each application may have an exclusive isolated resource area and also use resources in a shared resource area.
需要注意的是,在实际应用中,由于LC型应用的服务质量比BE型应用的服务质量要重要得多。因此,在本申请实施例中,设定LC型应用为关键应用,BE型应用为非关键应用。因此,可以设定BE型应用的隔离区资源始终为0。It should be noted that, in actual applications, since the service quality of LC applications is much more important than that of BE applications, in the embodiment of the present application, LC applications are set as critical applications and BE applications are set as non-critical applications. Therefore, the isolation area resources of BE applications can be set to always be 0.
2.延迟敏感(latency critical,LC)型应用和尽力而为(best-efforts,BE)型应用。2. Latency critical (LC) applications and best-efforts (BE) applications.
LC型应用的服务质量通常以尾延迟(tail latency,TL)进行度量,当LC型应用的服务速率慢于强求到达速率时,大量的请求会排队等待处理,从而导致请求排队时间急剧增加,进一步导致尾延迟急剧增加。也就是说,来自其他应用的干扰会对LC型应用的尾延迟产生极具破坏性的影响。The service quality of LC applications is usually measured by tail latency (TL). When the service rate of LC applications is slower than the mandatory request arrival rate, a large number of requests will be queued for processing, resulting in a sharp increase in request queuing time, which further leads to a sharp increase in tail latency. In other words, interference from other applications will have a very destructive impact on the tail latency of LC applications.
而对于BE型应用,在实际应用中主要关注它的执行时间,BE型应用遭受干扰导致的性能降低并不会带来致命的影响。其中,BE型应用又可以称为批处理(batchprocessing)型应用。As for BE type applications, in actual applications, the main concern is its execution time, and the performance degradation caused by interference to BE type applications will not bring fatal impact. Among them, BE type applications can also be called batch processing applications.
3.高并发云计算系统。3. Highly concurrent cloud computing system.
在实际应用中,为了在保障应用的服务质量的同时,提升机器的资源利用率,通常会将LC型应用和BE型应用混合部署在同一台物理机上,搭载多个虚拟机,形成高并发云计算系统。其中,高并发(high concurrency)是指通过设计保证系统能够同时并行处理大量请求,是互联网分布式系统架构设计中必须考虑的一个因素。云计算(cloud computing)是分布式计算的一种,是指通过网络云将巨大的数据计算处理程序分成多个小程序,然后通过多部服务器组成的系统进行处理,得到结果并返回给用户。In actual applications, in order to ensure the service quality of applications and improve the resource utilization of machines, LC-type applications and BE-type applications are usually deployed in a mixed manner on the same physical machine, equipped with multiple virtual machines, to form a high-concurrency cloud computing system. Among them, high concurrency refers to the design that ensures that the system can process a large number of requests in parallel at the same time, which is a factor that must be considered in the design of Internet distributed system architecture. Cloud computing is a type of distributed computing, which refers to dividing huge data computing processing programs into multiple small programs through the network cloud, and then processing them through a system composed of multiple servers to obtain results and return them to users.
下面结合图2,对高并发云计算系统进行说明。请参阅图2,图2为高并发云计算系统的一个系统架构示意图。The high-concurrency cloud computing system is described below in conjunction with Figure 2. Please refer to Figure 2, which is a schematic diagram of a system architecture of the high-concurrency cloud computing system.
如图2所示,在高并发云计算系统中,一台物理机上搭载多个虚拟机,每个虚拟机上运行一个或者多个应用。虚拟机上运行的应用包括LC型应用和BE型应用,在同一个虚拟机上运行的多个应用可以是相同类型的应用,也可以是不同类型的应用,具体此处不做限定。例如,在图2所示实施例中,虚拟机2上运行的应用2和应用3可以都是LC型应用或者BE型应用,也可以一个是LC型应用,一个是BE型应用。多个虚拟机可以同时运行各自的应用,使得系统能够同时并行处理大量的请求,保证高并发功能。As shown in FIG2 , in a high-concurrency cloud computing system, a physical machine is equipped with multiple virtual machines, and each virtual machine runs one or more applications. The applications running on the virtual machines include LC-type applications and BE-type applications. The multiple applications running on the same virtual machine can be applications of the same type or different types, which are not specifically limited here. For example, in the embodiment shown in FIG2 , application 2 and application 3 running on virtual machine 2 can both be LC-type applications or BE-type applications, or one can be an LC-type application and the other can be a BE-type application. Multiple virtual machines can run their respective applications at the same time, so that the system can process a large number of requests in parallel at the same time, ensuring high concurrency functions.
接下来,请参阅图3,图3为本申请实施例提供的资源调度方法的一个流程示意图,包括以下步骤:Next, please refer to FIG. 3, which is a flow chart of a resource scheduling method provided in an embodiment of the present application, including the following steps:
301.获取计算机系统中包括的至少一个延迟敏感LC型应用中每个LC型应用的剩余干扰容忍度。301. Obtain the residual interference tolerance of each LC type application in at least one delay-sensitive LC type application included in the computer system.
假设计算机系统中包括N个LC型应用,N为大于2的正整数。对于每个LC型应用,资源调度器能够获取以下变量:1)用户i理想尾延迟TLi0,也即单个LC型应用运行时的尾延迟;2)用户i受干扰后的尾延迟TLi1,也即多个应用共同运行时的尾延迟;3)用户i最大能容忍的尾延迟Mi,其取值由用户确定。在这三个变量中,TLi0<Mi,TLi0<TLi1。Assume that the computer system includes N LC applications, where N is a positive integer greater than 2. For each LC application, the resource scheduler can obtain the following variables: 1) the ideal tail delay TL i0 of user i, that is, the tail delay when a single LC application is running; 2) the tail delay TL i1 of user i after interference, that is, the tail delay when multiple applications are running together; 3) the maximum tolerable tail delay M i of user i, whose value is determined by the user. Among these three variables, TL i0 <M i , TL i0 <TL i1 .
根据上述变量,基于公式一,定义用户i能容忍的干扰量(anti-interferencecapability):According to the above variables, based on formula 1, the amount of interference that user i can tolerate (anti-interference capability) is defined as:
公式一: Formula 1:
其中,Ai表示用户i能容忍度的干扰量,由于TLi0<Mi,所以Ai的取值范围在(0,1)之间。Mi越小,Ai越接近0,表示用户i的抗干扰能力越弱;Mi越大,Ai越接近1,表示用户i的抗干扰能力越强。Wherein, Ai represents the amount of interference that user i can tolerate. Since TL i0 <M i , the value range of Ai is between (0, 1). The smaller Mi is , the closer Ai is to 0, indicating that the anti-interference ability of user i is weaker; the larger Mi is, the closer Ai is to 1, indicating that the anti-interference ability of user i is stronger.
根据上述变量,基于公式二,定义用户i实际受到的干扰量(really sufferedinterference):According to the above variables, based on Formula 2, the amount of interference that user i actually suffered is defined as:
公式二: Formula 2:
其中,Ri表示用户i实际受到的干扰量,由于TLi0<TLi1,所以Ri的取值范围在(0,1)之间。TLi1越小,Ri越接近0,表示用户i实际受到的干扰越小;TLi1越大,Ri越接近1,表示用户i实际受到的干扰越大。Wherein, Ri represents the actual interference amount to which user i is subjected. Since TL i0 < TL i1 , the value range of Ri is between (0, 1). The smaller TL i1 is, the closer Ri is to 0, indicating that the actual interference to user i is smaller; the larger TL i1 is, the closer Ri is to 1, indicating that the actual interference to user i is greater.
根据上述变量,基于公式三,定义LC型应用i的剩余干扰容忍度Si:According to the above variables, the residual interference tolerance S i of LC type application i is defined based on formula 3:
公式三: Formula 3:
结合公式一、公式二和公式三,资源调度器能够确定计算机系统中,每个LC型应用的剩余干扰容忍度。其中,剩余干扰容忍度反映的是应用的剩余抗干扰能力。剩余干扰容忍度越大,表示应用受到的干扰越小,剩余抗干扰能力越强。Combining Formula 1, Formula 2 and Formula 3, the resource scheduler can determine the residual interference tolerance of each LC type application in the computer system. The residual interference tolerance reflects the residual anti-interference ability of the application. The larger the residual interference tolerance, the less interference the application is subjected to and the stronger the residual anti-interference ability is.
302.从多个LC型应用中,获取剩余干扰容忍度最小的第一LC型应用。302. Obtain a first LC type application with the smallest residual interference tolerance from multiple LC type applications.
在获取到每个LC型应用的剩余干扰容忍度之后,可以比较每个LC型应用的剩余干扰容忍度,确定剩余干扰容忍度最小的第一LC型应用。After the residual interference tolerance of each LC type application is acquired, the residual interference tolerance of each LC type application may be compared to determine a first LC type application having the smallest residual interference tolerance.
303.若第一LC型应用的剩余干扰容忍度小于容忍度下限,则增加第一LC型应用的第一隔离区资源。303. If the remaining interference tolerance of the first LC type application is less than the tolerance lower limit, increase the first isolation area resources of the first LC type application.
资源调度器还会将第一LC型应用的剩余干扰容忍度与容忍度下限进行比较,从而确定第一LC型应用的受干扰程度是否超出了该应用的容忍阈值。如果第一LC型应用的剩余干扰容忍度下限小于容忍度下限,则意味着第一LC型应用受到了严重干扰,即将违反服务质量(quality of service,QoS)目标。那么需要增加第一LC型应用的第一隔离区资源,降低其他应用对第一LC型应用的干扰。The resource scheduler will also compare the remaining interference tolerance of the first LC type application with the tolerance lower limit to determine whether the interference level of the first LC type application exceeds the tolerance threshold of the application. If the remaining interference tolerance lower limit of the first LC type application is less than the tolerance lower limit, it means that the first LC type application is seriously interfered and will violate the quality of service (QoS) target. Then it is necessary to increase the first isolation area resources of the first LC type application to reduce the interference of other applications to the first LC type application.
在本申请实施例中,有多种方式增加第一隔离区资源,下面分别对可能的情况进行说明。In the embodiments of the present application, there are multiple ways to increase the resources of the first isolation area, and the possible situations are described below respectively.
需要注意的是,在上文已经说明,共享区资源是计算机系统中所有应用共同使用的资源。因此,在需要资源调度的情况下,如果直接将共享区的资源转移至第一LC型应用的隔离区,会对系统的性能产生较大的影响。所以,优先考虑从其他LC型应用的隔离区中转移资源。It should be noted that, as explained above, the shared area resources are resources used by all applications in the computer system. Therefore, if resource scheduling is required, if the resources in the shared area are directly transferred to the isolated area of the first LC type application, it will have a greater impact on the performance of the system. Therefore, it is preferred to transfer resources from the isolated areas of other LC type applications.
1)从其他应用的隔离区中转移资源至第一LC型应用的隔离区。1) Transfer resources from the isolation area of other applications to the isolation area of the first LC type application.
在这种情况下,资源调度器会选择资源丰富的LC型应用进行资源转移。如果资源调度器确定计算机系统中存在剩余干扰容忍度大于容忍度上限的第三LC型应用,且第三LC型应用存在可剥离的第三隔离区资源,那么意味着,在保证第三LC型应用的正常运行的情况下,第三LC型应用的隔离区资源还有多余的资源,可以将第三隔离区资源中的部分或者全部资源转移至第一隔离区,以增加第一隔离区资源。In this case, the resource scheduler will select the resource-rich LC type application for resource transfer. If the resource scheduler determines that there is a third LC type application in the computer system whose remaining interference tolerance is greater than the tolerance upper limit, and the third LC type application has a third isolation area resource that can be stripped, it means that while ensuring the normal operation of the third LC type application, the isolation area resources of the third LC type application still have excess resources, and part or all of the third isolation area resources can be transferred to the first isolation area to increase the resources of the first isolation area.
2)从共享区中转移资源至第一LC型应用的隔离区。2) Transfer resources from the shared area to the isolated area of the first LC type application.
如果资源调度器确定计算机系统中不存在第三LC型应用,那么,资源调度器会将资源共享区的资源转移至第一隔离区,以增加第一隔离区资源。If the resource scheduler determines that the third LC type application does not exist in the computer system, the resource scheduler will transfer the resources of the resource sharing area to the first isolated area to increase the resources of the first isolated area.
具体来说,可以有以下几种情况:Specifically, there are the following situations:
一种情况是,资源调度器确定计算机系统中存在剩余干扰容忍度大于容忍度上限的LC型应用,但是这些LC型应用的隔离区资源都是不可剥离的。One situation is that the resource scheduler determines that there are LC type applications in the computer system whose remaining interference tolerance is greater than the tolerance upper limit, but the isolation area resources of these LC type applications are not separable.
另一种情况是,在计算机系统中,所有LC型应用的剩余干扰容忍度均小于或者等于容忍度上限,也即所有的LC型应用的隔离区资源只能保证该应用处于正常运行的状态,如果将自身的隔离区资源转移给其他LC型应用转移给其他LC型应用,会导致自身的运行受到影响,加剧应用之间的干扰。Another situation is that in a computer system, the remaining interference tolerance of all LC-type applications is less than or equal to the upper tolerance limit, that is, the isolation area resources of all LC-type applications can only ensure that the application is in a normal operating state. If its own isolation area resources are transferred to other LC-type applications, its own operation will be affected, exacerbating the interference between applications.
示例性的,本申请实施例中进行转移的资源包括可划分的硬件资源,包括中央处理器(central processing units,CPU)核心数、末级共享缓存路数等,除此之外,还可以是其他的硬件资源,例如内存带宽、磁盘带宽等,具体此处不做限定。Exemplarily, the resources transferred in the embodiments of the present application include divisible hardware resources, including the number of central processing units (CPU) cores, the number of last-level shared cache paths, etc. In addition, it can also be other hardware resources, such as memory bandwidth, disk bandwidth, etc., which are not specifically limited here.
本申请实施例中,在增加第一LC型应用的第一隔离区资源时,优先考虑从其他LC型应用的隔离区资源中转移资源,尽可能减低对系统性能的不良影响,提升了技术方案的实用性。同时,对于不同的情况,有不同的方式增加第一隔离区资源,能够适应不同的场景,提升了本申请技术方案的灵活性和可适应性。In the embodiment of the present application, when increasing the first isolation area resources of the first LC type application, priority is given to transferring resources from the isolation area resources of other LC type applications, so as to minimize the adverse effects on system performance and improve the practicality of the technical solution. At the same time, for different situations, there are different ways to increase the first isolation area resources, which can adapt to different scenarios and improve the flexibility and adaptability of the technical solution of the present application.
304.若第一LC型应用的剩余干扰容忍度大于容忍度上限,则将多个LC型应用中的第二LC型应用的第二隔离区资源转移至资源共享区。304. If the remaining interference tolerance of the first LC type application is greater than the tolerance upper limit, the second isolation area resources of the second LC type application among the multiple LC type applications are transferred to the resource sharing area.
由于第一LC型应用的剩余干扰容忍度是多个LC型应用中剩余容忍度最小的,如果第一LC型应用的剩余干扰容忍度都大于或者等于容忍度上限,这就意味着,系统中包括的每个LC型应用都具有丰富的资源,没有受到干扰或者受干扰的程度较小。那么,就可以从这多个LC型应用中选择资源丰富的LC型应用,进行资源剥离,也即将第二LC型应用的第二隔离区资源转移至资源共享区。Since the remaining interference tolerance of the first LC type application is the smallest among the remaining tolerances of the multiple LC type applications, if the remaining interference tolerance of the first LC type application is greater than or equal to the tolerance upper limit, this means that each LC type application included in the system has abundant resources and is not interfered with or is interfered with to a lesser extent. Then, an LC type application with abundant resources can be selected from the multiple LC type applications for resource stripping, that is, the resources of the second isolation area of the second LC type application are transferred to the resource sharing area.
其中,第二LC型应用是指剩余干扰容忍度比第一LC型应用的剩余干扰容忍度大的应用。在一些可选的实施例中,可以选择多个LC型应用中,剩余干扰容忍度最大(也即资源最丰富)的LC型应用作为第二LC型应用,进行资源的剥离。选择资源最丰富的应用进行资源剥离,能够保证在不影响应用正常运行的情况下,提升资源的利用率。The second LC type application refers to an application whose residual interference tolerance is greater than that of the first LC type application. In some optional embodiments, the LC type application with the largest residual interference tolerance (i.e., the most abundant resources) among multiple LC type applications can be selected as the second LC type application for resource stripping. Selecting the application with the most abundant resources for resource stripping can ensure that the utilization rate of resources is improved without affecting the normal operation of the application.
本申请实施例中,根据计算机系统中各个LC型应用的剩余干扰容忍度的不同取值,确定不同的资源调度方式,从而实现增加LC型应用的隔离区资源,或者减少LC型应用的隔离区资源以增加计算机系统的共享区资源,实现了隔离区资源与共享区资源的灵活分配,提升了资源的利用率。同时,通过调度隔离区资源和共享区资源,降低了应用之间的干扰,也提升了系统性能。In the embodiment of the present application, different resource scheduling modes are determined according to different values of the residual interference tolerance of each LC type application in the computer system, so as to increase the isolated area resources of the LC type application, or reduce the isolated area resources of the LC type application to increase the shared area resources of the computer system, thereby realizing the flexible allocation of isolated area resources and shared area resources and improving the utilization rate of resources. At the same time, by scheduling isolated area resources and shared area resources, the interference between applications is reduced and the system performance is also improved.
在一些可选的实施例中,由于步骤203中的第一隔离区资源增加了,会将第一LC型应用的第一隔离区称为受益区,向第一隔离区转移资源的区域称为受害区。类似的,可以将步骤204中第二LC型应用的第二隔离区资源转移到了共享区,因此,第二LC型应用对应的第二隔离区称为受害区,共享区称为受益区。In some optional embodiments, since the resources of the first isolation area in step 203 are increased, the first isolation area of the first LC type application is called the beneficiary area, and the area to which the resources are transferred to the first isolation area is called the victim area. Similarly, the resources of the second isolation area of the second LC type application in step 204 can be transferred to the shared area, and therefore, the second isolation area corresponding to the second LC type application is called the victim area, and the shared area is called the beneficiary area.
下面从受害区和受益区的角度,对图3所示实施例进行进一步的说明。请参阅图4至图6,图4至图6均为本申请实施例提供的资源调度方法的流程示意图。The embodiment shown in Figure 3 is further described below from the perspective of the victim area and the benefited area. Please refer to Figures 4 to 6, which are all flowchart diagrams of the resource scheduling method provided in the embodiments of the present application.
请参阅图4,图4所示实施例为选择受益区的过程,包括以下步骤:Please refer to FIG4 . The embodiment shown in FIG4 is a process of selecting a benefited area, which includes the following steps:
401.从多个LC型应用中确定剩余干扰容忍度最小的第一LC型应用。401. Determine a first LC type application having the smallest residual interference tolerance from a plurality of LC type applications.
步骤401与图3所示实施例中步骤302类似,在上文已经说明,此处不再赘述。Step 401 is similar to step 302 in the embodiment shown in FIG. 3 , which has been described above and will not be repeated here.
402.第一LC型应用的剩余干扰容忍度是否小于容忍度下限,若是,则执行步骤403,若否,则执行步骤404。402. Whether the remaining interference tolerance of the first LC type application is less than the tolerance lower limit, if so, execute step 403, if not, execute step 404.
资源调度器会将第一LC型应用的剩余干扰容忍度与容忍度下限进行比较,确定第一LC型应用的剩余干扰容忍度是否小于容忍度下限,以根据比较结果确定第一LC型应用的隔离区是否为受益区。其中,容忍度下限可以由用户根据系统的需求设定,也可以由资源调度器或者其他设备设定,具体此处不做限定。The resource scheduler compares the remaining interference tolerance of the first LC type application with the tolerance lower limit to determine whether the remaining interference tolerance of the first LC type application is less than the tolerance lower limit, so as to determine whether the isolation area of the first LC type application is a benefit area according to the comparison result. The tolerance lower limit can be set by the user according to the system requirements, or by the resource scheduler or other devices, which is not limited here.
403.返回第一LC型应用对应的第一隔离区。403. Return to the first isolation area corresponding to the first LC type application.
如果第一LC型应用的剩余干扰容忍度小于容忍度下限,则意味着第一LC型应用受到了严重的干扰,需要增加第一LC型应用的隔离区资源。因此,可以确定第一LC型应用对应的第一隔离区为受益区。If the remaining interference tolerance of the first LC type application is less than the tolerance lower limit, it means that the first LC type application is seriously interfered and the isolation area resources of the first LC type application need to be increased. Therefore, the first isolation area corresponding to the first LC type application can be determined as the benefited area.
404.返回共享区。404. Return to shared area.
如果第一LC型应用的剩余干扰容忍度不小于容忍度下限,说明剩余容忍度最小的第一LC型应用受到干扰的程度较小或者没有受到干扰,能够正常运行,并不需要增加任何一个LC型应用的隔离区资源。在这种情况下,可以选择从资源丰富的LC型应用的隔离区中剥离资源至共享区中,因此可以确定共享区为受益区。If the remaining interference tolerance of the first LC type application is not less than the tolerance lower limit, it means that the first LC type application with the smallest remaining tolerance is less interfered or not interfered, and can operate normally, and there is no need to increase the isolation area resources of any LC type application. In this case, it is possible to choose to strip resources from the isolation area of the resource-rich LC type application to the shared area, so the shared area can be determined as the benefited area.
资源调度器还可以确定受害区,请参阅图5,图5所示实施例为选择受害区的过程,包括以下步骤:The resource scheduler may also determine the victim area. Please refer to FIG. 5 . The embodiment shown in FIG. 5 is a process of selecting the victim area, including the following steps:
501.多个LC型应用中是否存在剩余干扰容忍度大于容忍度上限,且具有可剥离的隔离区资源的第三LC型应用,若是,则执行步骤502,若否,则执行步骤503。501. Is there a third LC type application among the multiple LC type applications whose remaining interference tolerance is greater than the tolerance upper limit and has a strippable isolation area resource? If so, execute step 502; if not, execute step 503.
资源调度器会确定多个LC型应用中,是否存在受干扰程度小或者没有受干扰,并且具有可剥离的隔离区资源的第三LC型应用。其中,受干扰程度通过剩余干扰容忍度确定,剩余干扰容忍度越大,受干扰程度越小。The resource scheduler determines whether there is a third LC type application among the multiple LC type applications that is less interfered or not interfered and has separable isolation area resources. The interference degree is determined by the residual interference tolerance, and the greater the residual interference tolerance, the smaller the interference degree.
其中,容忍度上限可以由用户根据系统的需求设定,也可以由资源调度器或者其他设备设定,具体此处不做限定。The upper limit of tolerance may be set by the user according to system requirements, or may be set by a resource scheduler or other device, which is not specifically limited here.
在一些可选的实施例中,如果存在多个第三LC型应用,那么资源调度器可以优先选择剩余干扰容忍度最大的应用,作为进行资源转移的应用,具体此处不做限定。In some optional embodiments, if there are multiple third LC type applications, the resource scheduler may preferentially select an application with the largest remaining interference tolerance as the application to transfer resources, which is not specifically limited here.
502.返回第三LC型应用对应的第三隔离区。502. Return to the third isolation area corresponding to the third LC type application.
如果存在第三LC型应用,那么资源调度可以确定第三LC型应用对应的第三隔离区作为受害区。If there is a third LC type application, the resource scheduling may determine a third isolation area corresponding to the third LC type application as a victim area.
503.返回共享区。503. Return to shared area.
如果不存在第三LC型应用,那么意味着所有的LC型应用的剩余容忍度都在容忍度上限以下,各个LC型应用的资源只能保证自身正常运行,无法向其他应用转移资源,因此,选择共享区作为受害区。If there is no third LC type application, it means that the remaining tolerance of all LC type applications is below the upper tolerance limit. The resources of each LC type application can only ensure its own normal operation and cannot transfer resources to other applications. Therefore, the shared area is selected as the victim area.
本申请实施例中,除了明确受害区和受益区之外,还可以确定从受害区中转移至受益区的资源类型,并判断是否可以转移成功。也即图6所示实施例中,调用多个函数进行资源转移的过程。其中,图6所示实施例可以称为调用资源AdjustResource函数调整资源分配。请参阅图6,包括以下步骤:In the embodiment of the present application, in addition to clarifying the victim area and the beneficiary area, the resource type transferred from the victim area to the beneficiary area can also be determined, and it can be determined whether the transfer can be successful. That is, in the embodiment shown in FIG6, multiple functions are called to transfer resources. Among them, the embodiment shown in FIG6 can be called to call the resource AdjustResource function to adjust resource allocation. Please refer to FIG6, which includes the following steps:
601.通过findVictimRegion函数确定受害区。601. Determine the victim region through findVictimRegion function.
图5所示实施例为确定受害区的具体过程,详见图5所示,此处不再赘述。The embodiment shown in FIG5 is a specific process of determining the victim area, as shown in FIG5 for details, which will not be described again here.
602.通过findBeneficiaryRegion函数确定受益区。602. Determine the beneficiary area through the findBeneficiaryRegion function.
图4所示实施例为确定受益区的具体过程,详见图4所示,此处不再赘述。The embodiment shown in FIG4 is a specific process of determining the benefited area, as shown in FIG4 for details, which will not be described again here.
需要注意的是,步骤601和步骤602之间并没有必然的先后顺序,在实际应用中,可以先执行步骤601,也可以先执行步骤602,还可以同时执行步骤601和步骤602,具体此处不做限定。It should be noted that there is no necessary order between step 601 and step 602. In actual application, step 601 may be executed first, or step 602 may be executed first, or step 601 and step 602 may be executed simultaneously. The specific order is not limited here.
603.通过findVictimRegion函数确定要调整的资源。603. Determine the resources to be adjusted through the findVictimRegion function.
在findVictimRegion函数中,为每一个应用维护了一个有限状态机,状态机的每个状态表示当前资源类型,并根据有限状态机的状态确定要调整受害区中的资源类型。当满足条件1:当前的资源类型不能再被剥离;或者条件2:当前资源类型的调整会导致计算机系统的系统熵增加(也即加剧应用之间的干扰)时,状态机的状态会转移。状态机的状态机转移是指,被剥离的资源类型发生改变。示例性的,假设受害区中有3中资源类型,当前的资源类型为资源1,当满足上述条件中的任意一个,状态机的状态转移,再剥离该隔离区资源时,剥离的资源便不再是资源1,而是资源2。In the findVictimRegion function, a finite state machine is maintained for each application. Each state of the state machine represents the current resource type, and the resource type to be adjusted in the victim area is determined based on the state of the finite state machine. When condition 1 is met: the current resource type can no longer be stripped; or condition 2: the adjustment of the current resource type will cause the system entropy of the computer system to increase (that is, increase the interference between applications), the state of the state machine will transfer. The state machine transfer of the state machine means that the type of resource to be stripped changes. For example, assuming that there are 3 resource types in the victim area, and the current resource type is resource 1, when any one of the above conditions is met, the state of the state machine transfers, and when the isolation area resources are stripped, the stripped resource is no longer resource 1, but resource 2.
604.将一个单位要调整的资源从受害区移动至受益区。604. Move the resources to be adjusted by a unit from the affected area to the benefited area.
确定出受害区和受益区之后,资源调度器会将一个单位要调整的资源从受害区转移至受益区,实现资源调度。After determining the victim area and the beneficiary area, the resource scheduler will transfer the resources to be adjusted by a unit from the victim area to the beneficiary area to realize resource scheduling.
需要注意的是,在实际应用中,受害区和受益区可能是同一个区域,在这种情况下不会进行资源调度。产生这个情况的原因有两种,一种是当前的资源分配满足需求,也即所有LC型应用的剩余干扰容忍度都在容忍度下限至容忍度上限之间;一种是所有LC型应用都没有可以被剥离的隔离区。It should be noted that in actual applications, the victim area and the beneficiary area may be the same area, in which case no resource scheduling will be performed. There are two reasons for this situation: one is that the current resource allocation meets the demand, that is, the remaining interference tolerance of all LC-type applications is between the lower tolerance limit and the upper tolerance limit; the other is that all LC-type applications have no isolation area that can be stripped.
605.资源是否被成功调整,若是,则执行步骤606,若否,则执行步骤607。605. Are the resources adjusted successfully? If so, execute step 606; if not, execute step 607.
资源调度之后,资源调度器会根据系统熵的变化来判断此处调整的收益,从而选择保留此次变动,或者进行回滚。其中,系统熵指示的是计算机系统中应用之间的干扰程度。After resource scheduling, the resource scheduler will determine the benefits of the adjustment based on the change in system entropy, and choose to keep the change or roll it back. System entropy indicates the degree of interference between applications in a computer system.
具体来说,资源调度器能够获取调度前后的系统熵,如果资源调度后系统熵增加了,则确定资源调度不成功。如果资源调度后的系统熵减少了,则确定资源调度成功。Specifically, the resource scheduler can obtain the system entropy before and after scheduling. If the system entropy increases after resource scheduling, it is determined that the resource scheduling is unsuccessful. If the system entropy decreases after resource scheduling, it is determined that the resource scheduling is successful.
606.返回“True”。606. Returns "True".
在确定资源成功调整的情况下,资源调度器还可以返回“True”结果,指示资源调度成功。When it is determined that the resources are successfully adjusted, the resource scheduler may also return a "True" result, indicating that the resource scheduling is successful.
607.返回“False”。607. Returns "False".
在确定资源调整失败的情况下,资源调度器还可以返回“False”结果,指示资源调度不成功,并取消上次的调整并禁止在接下来一段时间剥离当前的受害区资源。In the case of determining that the resource adjustment fails, the resource scheduler may also return a "False" result, indicating that the resource scheduling is unsuccessful, canceling the last adjustment and prohibiting the stripping of the current victim area resources in the next period of time.
在一些可选的实施例中,判断是否进行资源调度的条件除了包括上文所示的比较第一LC型应用的剩余容忍度与容忍度下限之外,还可以比较计算机系统的当前系统熵与系统熵阈值,综合两个比较结果,确定是否进行资源调度。下面对这种情况进行说明。In some optional embodiments, in addition to comparing the remaining tolerance of the first LC type application with the tolerance lower limit as shown above, the condition for determining whether to perform resource scheduling may also include comparing the current system entropy of the computer system with the system entropy threshold, and combining the two comparison results to determine whether to perform resource scheduling. This situation is described below.
简单来说,资源调度器能够获取计算机系统的当前系统熵,当前系统熵用于指示计算机系统中应用之间的干扰程度。当前系统熵越大,表示计算机系统中各个应用之间的干扰程度越大。因此,在当前系统熵大于或等于系统熵阈值,且第一LC型应用的剩余干扰容忍度小于容忍度下限的情况下,资源调度器会增加第一隔离区资源。在当前系统熵大于或等于系统熵阈值,且第一LC型应用的剩余干扰容忍度大于容忍度上限的情况下,资源调度器会将第二LC型应用的第二隔离区资源转移至资源共享区。资源调度器转移资源的具体方式在图3所示实施例中已经说明,此处不再赘述。Simply put, the resource scheduler can obtain the current system entropy of the computer system, and the current system entropy is used to indicate the degree of interference between applications in the computer system. The larger the current system entropy, the greater the degree of interference between applications in the computer system. Therefore, when the current system entropy is greater than or equal to the system entropy threshold, and the remaining interference tolerance of the first LC type application is less than the lower limit of the tolerance, the resource scheduler will increase the resources of the first isolation area. When the current system entropy is greater than or equal to the system entropy threshold, and the remaining interference tolerance of the first LC type application is greater than the upper limit of the tolerance, the resource scheduler will transfer the second isolation area resources of the second LC type application to the resource sharing area. The specific way in which the resource scheduler transfers resources has been explained in the embodiment shown in Figure 3 and will not be repeated here.
其中,系统熵阈值可以由用户设定,也可以由资源调度器或者其他设备结合计算机系统的历史运行状态设定,具体此处不做限定。The system entropy threshold may be set by a user, or by a resource scheduler or other device in combination with a historical operating state of the computer system, and is not specifically limited here.
本申请实施例中,结合剩余容忍度和系统熵,来确定是否进行资源调度,对资源调度的条件进行了更加严格的限定,避免了在不必要的情况下进行资源调度,节约了计算资源。其中,不必要的情况包括,虽然第一LC型应用的剩余容忍度低于容忍度下限,但是当前系统熵小于系统熵阈值,也即计算机系统整体还是处于正常运行状态,不进行资源调度并不会带来很大的问题。In the embodiment of the present application, the remaining tolerance and system entropy are combined to determine whether to perform resource scheduling, and the conditions for resource scheduling are more strictly limited, so as to avoid performing resource scheduling in unnecessary situations and save computing resources. Among them, the unnecessary situation includes that although the remaining tolerance of the first LC type application is lower than the tolerance lower limit, the current system entropy is less than the system entropy threshold, that is, the computer system as a whole is still in normal operation, and not performing resource scheduling will not cause a big problem.
可选的,资源调度器获取当前系统熵的方式有多种方式,可以自行计算确定当前系统熵,也可以接收来自于其他设备(例如:计算设备)的当前系统熵,具体此处不做限定。下面,以资源调度器自行计算为例,进行说明。Optionally, there are many ways for the resource scheduler to obtain the current system entropy. The resource scheduler can calculate the current system entropy by itself, or receive the current system entropy from other devices (such as computing devices). The specifics are not limited here. The following is an example of the resource scheduler calculating by itself for explanation.
由于计算机系统中包括的应用有多种可能,因此,资源调度器可以通过不同的方式确定当前系统熵。下面分别对不同的情况进行说明。Since there are many possible applications included in the computer system, the resource scheduler can determine the current system entropy in different ways. Different situations are described below.
1)计算机系统只包括LC型应用。1) The computer system includes only LC type applications.
在这种情况下。资源调度器会获取每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量。然后根据每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量,确定当前系统熵。In this case, the resource scheduler obtains the amount of interference that each LC type application can tolerate and the amount of interference that each LC type application actually experiences, and then determines the current system entropy based on the amount of interference that each LC type application can tolerate and the amount of interference that each LC type application actually experiences.
具体来说,以步骤301中假设的计算机系统中包括N个LC型应用为例。资源调度器根据公式一获取每个Ri应用能容忍的干扰量Ai,根据公式二获取每个LC型应用实际受到的干扰量Ri。公式一和公式二如下所示:Specifically, take the assumption in step 301 that the computer system includes N LC applications as an example. The resource scheduler obtains the amount of interference Ai that each Ri application can tolerate according to formula 1, and obtains the amount of interference Ri actually received by each LC application according to formula 2. Formulas 1 and 2 are as follows:
公式一: 公式二: Formula 1: Formula 2:
其中,TLi0表示用户i理想尾延迟,TLi1表示用户i受干扰后的尾延迟,Mi表示用户i最大能容忍的尾延迟,在这三个变量中,TLi0<Mi,TLi0<TLi1。Wherein, TL i0 represents the ideal tail delay of user i, TL i1 represents the tail delay of user i after interference, and Mi represents the maximum tolerable tail delay of user i. Among these three variables, TL i0 <M i , TL i0 <TL i1 .
根据上述变量,基于公式四,资源调度器能够确定在用户i容忍能力之外所受到的干扰Qi:According to the above variables, based on Formula 4, the resource scheduler can determine the interference Qi that is beyond the tolerance of user i :
公式四: Formula 4:
基于公式五,资源调度器能够确定在计算机系统中只包括N个LC型应用的系统熵ELC:Based on Formula 5, the resource scheduler can determine the system entropy E LC in a computer system including only N LC-type applications:
公式五: Formula 5:
其中,ELC表示LC型应用不能容忍的干扰量,也即只存在LC型应用的计算机系统的系统熵。Here, E LC represents the amount of interference that LC-type applications cannot tolerate, that is, the system entropy of a computer system with only LC-type applications.
2)计算机系统包括LC型应用和BE型应用。2) Computer systems include LC type applications and BE type applications.
在这种情况下,资源调度器会获取来自于应用区分器或者用户的第一应用标识和第二应用标识,第一应用标识用于指示LC型应用,第二应用标识用于指示BE型应用。然后根据第一应用标识,确定计算机系统中的LC型应用;根据第二应用标识,确定计算机系统中的BE型应用。之后分别确定LC型应用的系统熵和BE型应用的系统熵,从而确定出计算机系统的当前系统熵。In this case, the resource scheduler obtains a first application identifier and a second application identifier from an application distinguisher or a user, wherein the first application identifier is used to indicate an LC type application, and the second application identifier is used to indicate a BE type application. Then, the LC type application in the computer system is determined according to the first application identifier; and the BE type application in the computer system is determined according to the second application identifier. Then, the system entropy of the LC type application and the system entropy of the BE type application are determined respectively, thereby determining the current system entropy of the computer system.
其中,确定LC型应用的系统熵的过程在上文已经说明,此处不再赘述。下面说明如何确定BE型应用的系统熵。The process of determining the system entropy of the LC type application has been described above and will not be repeated here. The following describes how to determine the system entropy of the BE type application.
资源调度器会获取至少一个BE型应用中每个BE型应用单独运行时的第一每周期指令数(instructions per cycle,IPC)和每个BE型应用受干扰后的第二每周期指令数,然后根据第一IPC和第二IPC确定BE型应用的系统熵。The resource scheduler obtains a first number of instructions per cycle (IPC) of each BE type application when it runs alone and a second number of instructions per cycle of each BE type application after being disturbed, and then determines the system entropy of the BE type application based on the first IPC and the second IPC.
示例性的,假设计算机系统中包括M个BE型应用,M为正整数。资源调度器可以根据公式六确定BE型应用的系统熵EBE:Exemplarily, assuming that the computer system includes M BE type applications, where M is a positive integer, the resource scheduler can determine the system entropy E BE of the BE type applications according to Formula 6:
公式六: Formula 6:
其中,EBE表示BE型应用不能容忍的干扰量;IPCoptional(i)表示应用i单独运行时的IPC,即第一IPC;IPCreal(i)表示应用i受到干扰后的IPC,即第二IPC。Among them, E BE represents the amount of interference that BE type applications cannot tolerate; IPC optional (i) represents the IPC when application i runs alone, that is, the first IPC; IPC real (i) represents the IPC after application i is interfered, that is, the second IPC.
在得到LC型应用的系统熵和BE型应用的系统熵之后,基于公式七确定当前系统熵ES:After obtaining the system entropy of the LC type application and the system entropy of the BE type application, the current system entropy E S is determined based on Formula 7:
公式七:ES=w×ELC+(1-w)×EBE Formula 7: E S = w × E LC + (1-w) × E BE
其中,w的取值范围为[0,1]。由于LC型应用的服务质量比BE型应用的服务质量重要得多,同时也希望在减少ELC和EBE的同时,实现最低的ES。因此,会将w的取值范围设定为[0.5,1],使得优先减少ELC,也即减少ELC的优先级大于减少EBE的优先级。The value range of w is [0, 1]. Since the service quality of LC type applications is much more important than that of BE type applications, it is also hoped that the lowest E S can be achieved while reducing E LC and E BE . Therefore, the value range of w is set to [0.5, 1], so that E LC is reduced first, that is, the priority of reducing E LC is greater than the priority of reducing E BE .
本申请实施例中,对于计算机系统所包括的应用类型的不同,资源调度器可以通过不同的方式,确定当前系统熵,从而对计算机系统的性能进行评估,也能够灵活适用不同的场景,提升了本申请技术方案的灵活性和实用性。In an embodiment of the present application, for different types of applications included in the computer system, the resource scheduler can determine the current system entropy in different ways to evaluate the performance of the computer system, and can also flexibly apply to different scenarios, thereby improving the flexibility and practicality of the technical solution of the present application.
本申请实施例中,资源调度器还会对资源调度的结果进行验证,资源调度器能够获取计算机系统资源调度后的系统熵;若资源调度后的系统熵小于当前系统熵,则确定资源调度成功。如果资源调度后的系统熵不小于当前系统熵,则会回退上一次资源调整。In the embodiment of the present application, the resource scheduler will also verify the result of resource scheduling. The resource scheduler can obtain the system entropy after the resource scheduling of the computer system; if the system entropy after the resource scheduling is less than the current system entropy, the resource scheduling is determined to be successful. If the system entropy after the resource scheduling is not less than the current system entropy, the last resource adjustment will be rolled back.
可选的,资源调度器可以基于上文的公式计算资源调度后的系统熵,还可以接收来自于其他设备(例如,计算设备)的资源调度后的系统熵,具体此处不做限定。Optionally, the resource scheduler may calculate the system entropy after resource scheduling based on the above formula, and may also receive the system entropy after resource scheduling from other devices (eg, computing devices), which is not specifically limited here.
本申请实施例中,通过验证,确保资源调度不会造成更坏的结果,也即不会加剧应用之间的干扰,提升了技术方案的可靠性。In the embodiment of the present application, verification is performed to ensure that resource scheduling will not cause worse results, that is, will not aggravate interference between applications, thereby improving the reliability of the technical solution.
总的来说,不论触发资源调度是基于剩余容忍度,还是基于剩余容忍度和当前系统熵,资源调度的全过程可以如图7所示,其中,图7是以资源调度器自行计算当前系统熵为例。请参阅图7,图7为本申请实施例提供的资源调度方法的一个流程示意图,包括以下步骤:In general, whether the triggering of resource scheduling is based on the remaining tolerance or based on the remaining tolerance and the current system entropy, the whole process of resource scheduling can be shown in FIG7, where FIG7 takes the resource scheduler calculating the current system entropy by itself as an example. Please refer to FIG7, which is a flow chart of a resource scheduling method provided in an embodiment of the present application, including the following steps:
701.监测LC型应用的尾延迟和BE型应用的IPC,并计算LC型应用的抗干扰能力。701. Monitor the tail delay of LC type applications and the IPC of BE type applications, and calculate the anti-interference ability of LC type applications.
资源调度器监测LC型应用的尾延迟和BE型应用的IPC是为了便于计算当前系统熵。LC型应用的抗干扰能力可以通过剩余干扰容忍度体现,具体的计算方式在上文公式中以及说明,此处不再赘述。The resource scheduler monitors the tail delay of LC type applications and the IPC of BE type applications to facilitate the calculation of the current system entropy. The anti-interference ability of LC type applications can be reflected by the residual interference tolerance. The specific calculation method is in the above formula and description, which will not be repeated here.
702.计算当前系统熵。702. Calculate the current system entropy.
资源调度器计算当前系统熵的方式在上文中已经说明,此处不再赘述。The method for calculating the current system entropy by the resource scheduler has been described above and will not be repeated here.
703.确定剩余容忍度最小的LC型应用。703. Determine the LC type application with the minimum residual tolerance.
步骤703与图3所示实施例步骤302类似,此处不再赘述。Step 703 is similar to step 302 of the embodiment shown in FIG. 3 , and will not be described again here.
704.进行资源调度。704. Perform resource scheduling.
资源调度器基于图3至图6所示实施例进行资源调度,将受害区的资源转移至受益区。The resource scheduler performs resource scheduling based on the embodiments shown in FIG. 3 to FIG. 6 , and transfers resources from the victim area to the benefited area.
705.确定成功调整资源是否并使得系统熵增加,若是,则执行步骤706,若否,则执行步骤707。705. Determine whether the resources are successfully adjusted and the system entropy increases. If so, execute step 706; if not, execute step 707.
在资源成功调整之后,需要确定资源调整的结果是否会导致系统熵增加。并根据判断结果执行相应的操作,确定系统中应用之间的干扰程度不会加剧。After the resources are successfully adjusted, it is necessary to determine whether the result of the resource adjustment will lead to an increase in system entropy. According to the judgment result, corresponding operations are performed to ensure that the interference between applications in the system will not increase.
706.回退上一次资源调整。706. Roll back the last resource adjustment.
如果系统熵增加,则意味着加剧了应用之间的干扰程度,需要回退上一次资源调整,并禁止在接下来的一段时间对该受害去进行资源剥离。If the system entropy increases, it means that the interference between applications has increased. It is necessary to roll back the last resource adjustment and prohibit resource stripping from the victim in the next period of time.
707.调用AdjustResource函数调整资源分配。707. Call AdjustResource function to adjust resource allocation.
如果系统熵没有增加,则可以保留本次资源调整行为,也即保留调用AdjustResource函数调整资源分配的操作。If the system entropy does not increase, the resource adjustment behavior can be retained, that is, the operation of calling the AdjustResource function to adjust resource allocation can be retained.
示例性的,假设容忍度下限为0.05,容忍度上限为0.1,请参阅图8,图8位本申请实施例提供的代码示意图。资源调度器可以基于图8所示的代码,实现资源调度。For example, assuming that the lower tolerance limit is 0.05 and the upper tolerance limit is 0.1, please refer to Figure 8, which is a schematic diagram of a code provided by an embodiment of the present application. The resource scheduler can implement resource scheduling based on the code shown in Figure 8.
在上文的说明中,资源调度器是根据应用的剩余容忍度,或者根据应用的剩余容忍度和系统熵,来确定是否进行资源调度,在实际应用中,资源调度器还可以基于其他的方式确定是否进行资源调度,下面对这种情况进行说明。In the above description, the resource scheduler determines whether to perform resource scheduling based on the remaining tolerance of the application, or based on the remaining tolerance of the application and the system entropy. In actual applications, the resource scheduler can also determine whether to perform resource scheduling based on other methods. This situation is explained below.
资源调度器能够获取来自于服务质量预测器的第一调度信息或第二调度信息,第一调度信息指示增加多个LC型应用中目标LC型应用的隔离区资源,第二调度信息指示减少目标LC型应用的隔离区资源。然后,根据第一调度信息,增加目标LC型应用的隔离区资源;或者,根据第二调度信息,减少目标LC型应用的隔离区资源。The resource scheduler can obtain the first scheduling information or the second scheduling information from the service quality predictor, the first scheduling information indicates to increase the isolation area resources of the target LC type application among the multiple LC type applications, and the second scheduling information indicates to reduce the isolation area resources of the target LC type application. Then, according to the first scheduling information, the isolation area resources of the target LC type application are increased; or according to the second scheduling information, the isolation area resources of the target LC type application are reduced.
具体来说,如果确定增加目标LC型应用的隔离区资源,那么资源调度器可以确定目标LC型应用的隔离区为受益区,基于图6所示实施例步骤601,步骤603至步骤607,确定此次资源调度的受害区,并判断是都可以成功进行资源调度,具体此处不再赘述。如果确定减少目标LC型应用的隔离区资源,那么资源调度器可以确定目标LC型应用的隔离区为受害区,基于图6所示实施例步骤602至步骤607,确定此次资源调度的受益区,并判断是都可以成功进行资源调度,具体此处不再赘述。Specifically, if it is determined to increase the isolation area resources of the target LC type application, then the resource scheduler can determine that the isolation area of the target LC type application is a beneficiary area, based on step 601 of the embodiment shown in FIG6, step 603 to step 607, determine the victim area of this resource scheduling, and judge whether all can be successfully scheduled, and the details are not repeated here. If it is determined to reduce the isolation area resources of the target LC type application, then the resource scheduler can determine that the isolation area of the target LC type application is a victim area, based on step 602 to step 607 of the embodiment shown in FIG6, determine the beneficiary area of this resource scheduling, and judge whether all can be successfully scheduled, and the details are not repeated here.
本申请实施例中,可以有多种方式触发资源调度器进行资源调度,处理利用剩余容忍度触发之外,还可以基于剩余容忍度和当前系统熵触发,除此之外,还可以根据指示信息触发,能够适应实际应用中的不同需求,提升了技术方案的灵活性和可适应性。另外,资源调度器可以直接根据调度信息进行确定资源调度的区域,使得资源调度器即使无法获取剩余容忍度或者当前系统熵的情况下,依旧能够进行资源调度,保证了资源调度器的顺利工作,提升了技术方案的可靠性。In the embodiment of the present application, there are multiple ways to trigger the resource scheduler to perform resource scheduling. In addition to the remaining tolerance trigger, the remaining tolerance and the current system entropy trigger can also be used. In addition, it can also be triggered according to the indication information, which can adapt to different needs in actual applications and improve the flexibility and adaptability of the technical solution. In addition, the resource scheduler can directly determine the resource scheduling area based on the scheduling information, so that even if the resource scheduler cannot obtain the remaining tolerance or the current system entropy, it can still perform resource scheduling, ensuring the smooth operation of the resource scheduler and improving the reliability of the technical solution.
在一些可选的实施例中,本申请实施例提供的资源调度器还可以实现以下功能:In some optional embodiments, the resource scheduler provided in the embodiments of the present application may also implement the following functions:
1)在计算机系统只包括LC型应用的情况下,计算当前系统熵。1) In the case where the computer system only includes LC type applications, calculate the current system entropy.
假设计算机系统中包括M个BE型应用,M为正整数。资源调度器可以根据上述公式六确定BE型应用的系统熵EBE。在这种情况下,EBE表示BE型应用不能容忍的干扰量,也即只存在BE型应用的计算机系统的系统熵。Assume that the computer system includes M BE type applications, where M is a positive integer. The resource scheduler can determine the system entropy E BE of the BE type application according to the above formula 6. In this case, E BE represents the amount of interference that the BE type application cannot tolerate, that is, the system entropy of the computer system with only BE type applications.
2)基于公式八,确定计算机系统的系统总抗干扰能力A:2) Based on Formula 8, determine the total anti-interference capability A of the computer system:
公式八: Formula 8:
其中,系统总抗干扰能力A是指,计算机系统中只包括N个LC型应用的系统总抗干扰能力,Ai表示用户i能容忍度的干扰量,在上文公式一中进行了说明,此处不再赘述。The total anti-interference capability of the system A refers to the total anti-interference capability of the system including only N LC-type applications in the computer system, and Ai represents the amount of interference that the user i can tolerate, which is explained in the above formula 1 and will not be repeated here.
3)基于公式九,确定计算机系统受到的总干扰量R:3) Based on Formula 9, determine the total interference R to the computer system:
公式九: Formula 9:
其中,系统受到的总干扰量R是指,计算机系统中只包括N个LC型应用的系统受到的总干扰量,Ri表示用户i实际受到的干扰量,在上文公式二中进行了说明,此处不再赘述。The total interference amount R received by the system refers to the total interference amount received by the computer system including only N LC-type applications, and Ri represents the actual interference amount received by user i, which is explained in the above formula 2 and will not be repeated here.
4)基于公式十,确定计算机系统的剩余干扰能力S:4) Based on Formula 10, determine the residual interference capacity S of the computer system:
公式十: Formula 10:
其中,系统的剩余干扰能力S是指,计算机系统中只包括N个LC型应用的系统剩余干扰能力S,Si表示LC型应用i的剩余干扰容忍度,在上文公式三中进行了说明,此处不再赘述。The system's residual interference capability S refers to the system's residual interference capability S of only N LC-type applications in the computer system, and Si represents the residual interference tolerance of LC-type application i, which is explained in Formula 3 above and will not be repeated here.
本申请实施例中,基于公式一至公式十,资源调度器能够从多个角度对计算机系统的性能进行评估。In the embodiment of the present application, based on Formulas 1 to 10, the resource scheduler can evaluate the performance of the computer system from multiple perspectives.
本申请实施例中,还提供了一种资源调度方法,该方法应用于服务质量预测器。In an embodiment of the present application, a resource scheduling method is also provided, which is applied to a service quality predictor.
服务质量预测器的主要原理是:LC型应用的高尾延迟是由请求的排队引起的。LC型应用对来自客户端的请求会在队列中排队等待服务端的处理,请求的延迟由系统的处理时间和排队时间组成。当系统可用资源不足或是调度不当时,应用间干扰会变得严重,并且处理一个请求需要更长时间,相应地,系统中平均队列也会变长,增加请求的排队时间。The main principle of the service quality predictor is that the high tail latency of LC-type applications is caused by the queuing of requests. LC-type applications will queue requests from clients in a queue waiting for processing by the server. The latency of the request is composed of the system processing time and the queuing time. When the system has insufficient available resources or improper scheduling, the interference between applications will become serious, and it will take longer to process a request. Correspondingly, the average queue in the system will also become longer, increasing the queuing time of requests.
根据排队论中的结论,队列长度的增长会导致延迟呈指数级增加。服务质量预测器基于延迟敏感型应用的排队现象进行检测,间接反映出LC型应用受干扰的严重程度。客户端发来的请求可能在网络中多个位置排队等待,如中转节点的网络路由,网卡缓冲区,操作系统缓冲区等。According to the conclusions in queuing theory, the growth of queue length will lead to an exponential increase in delay. The service quality predictor is based on the queuing phenomenon of delay-sensitive applications, which indirectly reflects the severity of interference to LC-type applications. Requests sent by clients may queue at multiple locations in the network, such as the network routing of the transit node, the network card buffer, the operating system buffer, etc.
应用受到的干扰严重与否直接影响着应用从操作系统缓冲区中获取数据的速度,通过服务质量预测器监测各个应用对应的缓冲区中的排队队列,从而反映同一物理机上运行的不同应用之间的干扰程度。其中,缓冲区中的排队队列可以用平均网络接收队列(Recv-Q)长度指标测量。The severity of the interference to the application directly affects the speed at which the application obtains data from the operating system buffer. The service quality predictor monitors the queues in the buffer corresponding to each application, thereby reflecting the degree of interference between different applications running on the same physical machine. The queues in the buffer can be measured by the average network receive queue (Recv-Q) length indicator.
请参阅图9,图9为本申请实施例提供的资源调度方法的一个流程示意图,包括以下步骤:Please refer to FIG. 9 , which is a flow chart of a resource scheduling method provided in an embodiment of the present application, comprising the following steps:
901.获取目标LC型应用对应的多个网络接收队列长度。901. Obtain multiple network receiving queue lengths corresponding to the target LC type application.
以连续采样目标LC型应用对应的缓冲区中的30个数据点为例,服务质量预测器按照固定的时间间隔,对这30个数据点进行连续采样,得到30个网络接收队列长度。Taking the continuous sampling of 30 data points in the buffer corresponding to the target LC type application as an example, the service quality predictor continuously samples these 30 data points at a fixed time interval to obtain 30 network receiving queue lengths.
需要注意的是,采样的网络接收队列长度的数量,以及采样的时间间隔都可以根据实际应用的需要选择,具体此处不做限定。It should be noted that the number of sampled network receive queue lengths and the sampling time interval can be selected according to actual application needs and are not specifically limited here.
902.计算多个网络接收队列长度的均值,得到平均网络接收队列长度。902. Calculate the average of multiple network receive queue lengths to obtain an average network receive queue length.
得到30个网络接收队列长度之后,用平均网络接收队列长度来反映目标LC型应用的受干扰程度。After obtaining 30 network receiving queue lengths, the average network receiving queue length is used to reflect the interference degree of the target LC type application.
903.确定网络是否大于长度阈值,若是,则执行步骤904,若否,则执行步骤905。903. Determine whether the network is greater than a length threshold, if so, execute step 904, if not, execute step 905.
如果目标LC型应用对应的平均网络接收队列长度大于长度阈值,则认为目标KC型应用受到了严重的干扰,资源紧张,需要为其分配资源。如果目标LC型应用对应的平均网络接收队列长度小于或等于长度阈值,则需要进一步判断是否要剥离目标LC型应用的隔离区资源。If the average network receive queue length corresponding to the target LC type application is greater than the length threshold, it is considered that the target KC type application is seriously interfered with and resources are scarce, and resources need to be allocated to it. If the average network receive queue length corresponding to the target LC type application is less than or equal to the length threshold, it is necessary to further determine whether to strip the isolation area resources of the target LC type application.
904.向资源调度器发送第一调度信息,第一调度信息指示增加目标LC型应用的隔离区资源。904. Send first scheduling information to the resource scheduler, where the first scheduling information indicates to increase the isolation area resources of the target LC type application.
在目标LC型应用对应的平均网络接收队列长度大于长度阈值的情况下,服务质量预测器向资源调度器发送第一调度信息,以指示资源调度器为目标LC型应用增加隔离区资源。When the average network receiving queue length corresponding to the target LC type application is greater than the length threshold, the service quality predictor sends first scheduling information to the resource scheduler to instruct the resource scheduler to increase isolation area resources for the target LC type application.
905.确定多个网络接收队列长度中取值为0的网络接收队列长度在多个网络接收队列长度中的占比是否大于比例阈值,若是,则执行步骤906,若否,则退回步骤901。905. Determine whether the proportion of the network receiving queue lengths with a value of 0 among the multiple network receiving queue lengths is greater than the proportion threshold. If so, execute step 906; if not, return to step 901.
在目标LC型应用对应的平均网络接收队列长度小于或等于长度阈值的情况下,服务质量预测器需要进一步判断采样数据中等于0的比例是否满足条件,也即判断取值为0的网络接收队列长度在多个网络接收队列长度中的占比是否大于比例阈值。如果大于比例阈值,则认为目标LC型应用的隔离区资源有多余,可以剥离。如果不大于,则不对目标LC型应用的隔离区资源进行调整,继续监测目标LC型应用的网络接收队列长度。When the average network receiving queue length corresponding to the target LC type application is less than or equal to the length threshold, the service quality predictor needs to further determine whether the proportion of the sampled data equal to 0 meets the condition, that is, to determine whether the proportion of the network receiving queue length with a value of 0 in the multiple network receiving queue lengths is greater than the proportion threshold. If it is greater than the proportion threshold, it is considered that the isolation area resources of the target LC type application are redundant and can be stripped. If it is not greater than, the isolation area resources of the target LC type application are not adjusted, and the network receiving queue length of the target LC type application continues to be monitored.
其中,比例阈值可以由服务质量预测器根据计算机系统的历史运行状态确定,也可以由计算机系统中的其他设备(例如,计算设备)确定,具体此处不做限定。The ratio threshold may be determined by a service quality predictor according to a historical operating state of the computer system, or may be determined by other devices (eg, computing devices) in the computer system, which is not specifically limited herein.
示例性的,假设30个网络接收队列长度中,有25个网络接收队列长度值为0,比例阈值为80%,那么服务质量预测器会执行步骤906。Exemplarily, assuming that among the 30 network receiving queue lengths, 25 network receiving queue lengths have a value of 0 and the ratio threshold is 80%, the service quality predictor will execute step 906 .
906.向资源调度器发送第二调度信息,第二调度信息指示减少目标LC型应用的隔离区资源。906. Send second scheduling information to the resource scheduler, where the second scheduling information indicates reducing the isolation area resources of the target LC type application.
在目标LC型应用对应的平均网络接收队列长度小于或等于长度阈值,且取值为0的网络接收队列长度在多个网络接收队列长度中的占比大于比例阈值的情况下,服务质量预测器向资源调度器发送第二调度信息,以指示资源调度器为目标LC型应用减少隔离区资源。When the average network receive queue length corresponding to the target LC type application is less than or equal to the length threshold, and the proportion of the network receive queue length with a value of 0 in multiple network receive queue lengths is greater than the proportion threshold, the service quality predictor sends second scheduling information to the resource scheduler to instruct the resource scheduler to reduce the isolation area resources for the target LC type application.
本申请实施例中,服务质量预测器通过对各个LC型应用对应的缓冲区进行监测,不需要计算系统熵或者容忍度等,便可以确定是否需要进行资源调度,简化了流程,节约了计算资源。In the embodiment of the present application, the service quality predictor monitors the buffer corresponding to each LC type application, and can determine whether resource scheduling is required without calculating system entropy or tolerance, etc., thereby simplifying the process and saving computing resources.
本申请实施例中,还提供了一种应用识别方法,该方法应用于应用区分器。In an embodiment of the present application, an application identification method is also provided, which is applied to an application distinguisher.
应用区分器区分虚拟机上运行的应用为LC型应用还是BE型应用的主要依据是虚拟机网络发送/接收带宽比例的数据。LC型应用的网络发送/接收带宽比例在不同的负载下始终保持恒定,而BE型应用的网络发送/接收带宽比例随着时间推移的变化幅度很大。The application distinguisher distinguishes whether the application running on the virtual machine is an LC application or a BE application based on the data of the network send/receive bandwidth ratio of the virtual machine. The network send/receive bandwidth ratio of the LC application remains constant under different loads, while the network send/receive bandwidth ratio of the BE application changes greatly over time.
示例性的,请参阅图10a和图10b。图10a以LC型应用中的Redis应用的发送带宽比例为例,图10b以BE应用中基于Spark框架的Terasort应用的发送带宽比例为例。如图10a所示,Redis应用的发送带宽比例的变化幅度很小,可以认为是保持恒定。如图10b所示,基于Spark框架的Terasort应用的发送带宽比例的变化幅度很大。For example, please refer to FIG. 10a and FIG. 10b. FIG. 10a takes the sending bandwidth ratio of the Redis application in the LC type application as an example, and FIG. 10b takes the sending bandwidth ratio of the Terasort application based on the Spark framework in the BE application as an example. As shown in FIG. 10a, the change range of the sending bandwidth ratio of the Redis application is very small and can be considered to be constant. As shown in FIG. 10b, the change range of the sending bandwidth ratio of the Terasort application based on the Spark framework is very large.
需要注意的是,图10a和图10b都是以发送带宽比例为例,发送带宽比例表示的是应用的发送网络带宽占总带宽的比例。在实际应用中还可以比较接收带宽比例,接收带宽比例=1-发送带宽比例。It should be noted that both Figure 10a and Figure 10b use the sending bandwidth ratio as an example, which represents the ratio of the application's sending network bandwidth to the total bandwidth. In actual applications, the receiving bandwidth ratio can also be compared, where the receiving bandwidth ratio = 1 - the sending bandwidth ratio.
出现图10a和图10b这种现象的原因是,LC型应用与BE型应用在网络数据使用上呈现不同的模式。虽然同一虚拟机每次发送/接收的网络请求包的大小可能不同,但可以认为其发送/接收网络请求包的大小始终服从某种分布,因此在对虚拟机在同一场景下一定数量的请求数据进行统计时,应用发送/接收的请求大小的均值应该趋近于分布的均值。对于这两种不同类型的应用,LC型应用在单位时间内能够处理较多的请求(通常为102~106),所以在对LC型应用采样时多次采样到的均值应该基本恒定;而BE型应用在单位时间内通常仅能处理较少请求(通常为0~10),采样到的数据取决于BE型应用处于哪个阶段,通常变化较大。所以可以认为在不同时间测到的LC型应用的发送带宽和接收带宽的比例变化很小,而BE型应用的发送带宽和接收带宽的比例变化很大。The reason for the phenomenon in Figures 10a and 10b is that LC applications and BE applications present different patterns in network data usage. Although the size of the network request packet sent/received by the same virtual machine may be different each time, it can be considered that the size of the network request packet sent/received always follows a certain distribution. Therefore, when a certain number of request data of the virtual machine in the same scenario are counted, the mean of the request size sent/received by the application should be close to the mean of the distribution. For these two different types of applications, LC applications can process more requests per unit time (usually 102-106), so the mean value obtained by multiple samplings when sampling LC applications should be basically constant; while BE applications can usually only process fewer requests per unit time (usually 0-10), and the sampled data usually varies greatly depending on which stage the BE application is in. Therefore, it can be considered that the ratio of the sending bandwidth to the receiving bandwidth of the LC application measured at different times changes little, while the ratio of the sending bandwidth to the receiving bandwidth of the BE application changes greatly.
请参阅图11,图11为本申请实施例提供的应用识别方法的流程示意图,包括以下步骤:Please refer to FIG. 11 , which is a flow chart of an application identification method provided in an embodiment of the present application, comprising the following steps:
1101.获取计算机系统中多个应用当前阶段的多个网络总带宽平均值。1101. Obtain the average values of multiple network total bandwidths of multiple applications in the computer system at the current stage.
示例性的,以获取一个应用的网络总带宽的平均值为例。定义当前阶段为第i阶段,Xij代表当前阶段第j个间隔的数据,代表当前阶段网络总带宽的平均值。基于公式十一确定当前阶段网络总带宽的平均值 For example, take the average value of the total network bandwidth of an application as an example. Define the current stage as the i-th stage, Xij represents the data of the j-th interval in the current stage, Represents the average value of the total network bandwidth at the current stage. The average value of the total network bandwidth at the current stage is determined based on Formula 11
公式十一: Formula 11:
根据公式十一,能够计算出计算系统中多个应用在当前阶段的多个网络总带宽平均值。According to Formula 11, the average values of the total network bandwidths of multiple applications in the computing system at the current stage can be calculated.
在一些可选的实施例中,可以由计算设备监测各个应用的网络总带宽,并计算网络总带宽的平均值,发送给应用区分器,使得应用区分器获取多个网络总带宽平均值,具体此处不做限定。In some optional embodiments, the computing device may monitor the total network bandwidth of each application, calculate the average value of the total network bandwidth, and send it to the application differentiator, so that the application differentiator obtains multiple average values of the total network bandwidth, which is not specifically limited here.
1102.根据多个网络总带宽平均值,确定多个应用在当前阶段的多个网络总带宽区间变异系数。1102. Determine the coefficient of variation of multiple network total bandwidth intervals of multiple applications at the current stage based on the average values of multiple network total bandwidths.
在获取到多个网络总带宽平均值之后,应用区分器可以基于公式十二确定多个网络总带宽区间变异系数ICOV:After obtaining the average values of the total bandwidths of multiple networks, the application distinguisher can determine the interval variation coefficients ICOV of the total bandwidths of multiple networks based on Formula 12:
公式十二: Formula 12:
在一些可选的实施例中,可以由计算设备计算各个应用的网络总带宽区间变异系数,并发送给应用区分器,使得应用区分器获取网络总带宽区间变异系数,具体此处不做限定。In some optional embodiments, the computing device may calculate the interval variation coefficient of the total network bandwidth of each application and send it to the application differentiator, so that the application differentiator obtains the interval variation coefficient of the total network bandwidth, which is not specifically limited here.
1103.若多个网络总带宽区间变异系数大于区间系数阈值,则获取多个应用在当前阶段的多个末尾发送/接收带宽比例与下一阶段的多个起始发送/接收带宽比例的多个差值绝对值。1103. If the interval variation coefficients of the multiple network total bandwidths are greater than the interval coefficient threshold, then obtain the absolute values of the differences between the multiple end send/receive bandwidth ratios of the multiple applications in the current stage and the multiple start send/receive bandwidth ratios in the next stage.
应用区分器需要监测程序运行的阶段变化,这是因为BE型应用有可能长时间处于网络传输数据或运算过程中,会导致采集到的数据始终保持恒定而导致误判,所以需要对程序运行的阶段变化进行检测,对阶段变化前后的数据进行分析。The application distinguisher needs to monitor the stage changes of program operation. This is because BE-type applications may be in the process of network data transmission or calculation for a long time, which will cause the collected data to remain constant and lead to misjudgment. Therefore, it is necessary to detect the stage changes of program operation and analyze the data before and after the stage changes.
因此,应用区分器会确定网络总带宽区间变异系数是否大于区间系数阈值,如果某一应用的网络总带宽区间变异系数大于区间系数阈值,则意味着该应用开始发生阶段性变化,会对该应用阶段变化前后的数据进行采集分析。其中,区间系数阈值可以根据实际应用的需要设定,区间系数阈值越小,能检测到更细微的阶段变化,使得检测到的阶段越多;区间系数阈值越大,应用发生更大的阶段变化才能被检测到。示例性的,区间系数阈值可以为0.25。Therefore, the application distinguisher will determine whether the interval variation coefficient of the total network bandwidth is greater than the interval coefficient threshold. If the interval variation coefficient of the total network bandwidth of a certain application is greater than the interval coefficient threshold, it means that the application has begun to undergo a phased change, and the data before and after the phase change of the application will be collected and analyzed. Among them, the interval coefficient threshold can be set according to the needs of the actual application. The smaller the interval coefficient threshold, the more subtle phase changes can be detected, so that more phases are detected; the larger the interval coefficient threshold, the larger the phase changes of the application can be detected. Exemplarily, the interval coefficient threshold can be 0.25.
在网络总带宽区间变异系数大于区间系数阈值的情况下,应用区分器会获取该应用在当前阶段的末尾发送/接收带宽比例与下一阶段的起始发送/接收带宽比例的差值绝对值绝对值。其中,发送带宽比例是指,该应用的发送网络带宽占总网络带宽的比例;接收带宽比例是指,该应用的接收网络带宽占总网络带宽的比例。When the coefficient of variation of the total network bandwidth interval is greater than the interval coefficient threshold, the application distinguisher obtains the absolute value of the difference between the send/receive bandwidth ratio of the application at the end of the current stage and the send/receive bandwidth ratio at the beginning of the next stage. The send bandwidth ratio refers to the ratio of the application's send network bandwidth to the total network bandwidth; the receive bandwidth ratio refers to the ratio of the application's receive network bandwidth to the total network bandwidth.
示例性的,以发送带宽比例为例,假设当前阶段有采集到6个数据,那么可以确定这6个数据中,最后一个数据的发送带宽比例作为当前阶段的末尾发送带宽比例。在当前阶段之后采集到的下一个数据作为下一阶段的起始采集数据,该数据的发送带宽比例作为下一阶段的起始发送带宽比例。For example, taking the transmission bandwidth ratio as an example, assuming that 6 data are collected in the current stage, the transmission bandwidth ratio of the last data among the 6 data can be determined as the end transmission bandwidth ratio of the current stage. The next data collected after the current stage is used as the starting collection data of the next stage, and the transmission bandwidth ratio of the data is used as the starting transmission bandwidth ratio of the next stage.
在一些可选的实施例中,可以由计算设备计算多个差值绝对值,并发送给应用区分器,使得应用区分器获取多个差值绝对值,具体此处不做限定。In some optional embodiments, a computing device may calculate multiple absolute values of the differences and send the values to the application distinguisher, so that the application distinguisher obtains the multiple absolute values of the differences, which is not specifically limited here.
1104.获取多个应用在当前阶段的前后目标时间段内的多个发送/接收带宽比例变异系数。1104. Obtain multiple sending/receiving bandwidth ratio variation coefficients of multiple applications in the target time periods before and after the current stage.
示例性的,以获取一个应用的发送/接收带宽比例的变异系数为例。假设目标时间段为10秒,即计算当前阶段的前10秒至当前阶段的后10秒这段时间内的发送/接收数据比例变异系数COV。其中,COV为这段时间内发送/接收带宽比例的标准差σ和这段时间内发送/接收带宽比例的平均值μ之比,是用于衡量数据概率分布离散程度的一个归一化量度。For example, take the coefficient of variation of the send/receive bandwidth ratio of an application as an example. Assuming the target time period is 10 seconds, the coefficient of variation COV of the send/receive data ratio during the period from the first 10 seconds of the current stage to the last 10 seconds of the current stage is calculated. Among them, COV is the ratio of the standard deviation σ of the send/receive bandwidth ratio during this period to the average value μ of the send/receive bandwidth ratio during this period, which is a normalized measure used to measure the discrete degree of data probability distribution.
在一些可选的实施例中,可以由计算设备计算多个发送/接收带宽比例变异系数,并发送给应用区分器,使得应用区分器获取多个发送/接收带宽比例变异系数,具体此处不做限定。In some optional embodiments, a computing device may calculate multiple transmission/reception bandwidth ratio variation coefficients and send them to an application differentiator, so that the application differentiator obtains multiple transmission/reception bandwidth ratio variation coefficients, which are not specifically limited here.
1105.根据多个差值绝对值或多个发送/接收带宽比例变异系数,确定多个应用中LC型应用的标识为第一应用标识,多个应用中BE型应用的标识为第二应用标识。1105. According to multiple absolute values of differences or multiple transmission/reception bandwidth ratio variation coefficients, determine that the identifier of the LC type application in the multiple applications is the first application identifier, and the identifier of the BE type application in the multiple applications is the second application identifier.
应用区分器可以从多个应用中,确定差值绝对值大于差值阈值,和/或,发送/接收带宽比例变异系数大于系数阈值的应用,为BE型应用。并确定多个应用中BE型应用之外的应用为LC型应用。之后,标记LC型应用的标识为第一应用标识,标记BE型应用的标识为第二应用标识。The application distinguisher can determine, from multiple applications, that an application whose absolute value of difference is greater than a difference threshold and/or whose coefficient of variation of the transmission/reception bandwidth ratio is greater than a coefficient threshold is a BE type application. And determine that applications other than the BE type applications in the multiple applications are LC type applications. Then, the identifier of the LC type application is marked as a first application identifier, and the identifier of the BE type application is marked as a second application identifier.
具体来说,应用区分器区分应用类型的依据有多种,下面分别对可能的情况进行说明。Specifically, there are various bases for the application distinguisher to distinguish application types, and possible situations are described below respectively.
1)确定多个应用中差值绝对值大于差值阈值的应用为BE型应用,其他的应用为LC型应用。1) Determine among multiple applications that the absolute value of the difference is greater than the difference threshold as a BE type application, and the other applications are LC type applications.
2)确定多个应用中发送/接收带宽比例变异系数大于系数阈值的应用为BE型应用,其他的应用为LC型应用。2) Determine among multiple applications that the transmission/reception bandwidth ratio variation coefficient is greater than a coefficient threshold as a BE type application, and the other applications are LC type applications.
3)确定多个应用中差值绝对值大于差值阈值的,且发送/接收带宽比例变异系数大于系数阈值应用为BE型应用,其他的应用为LC型应用。3) Determine that the applications whose absolute values of differences are greater than a difference threshold and whose coefficient of variation of the transmission/reception bandwidth ratio is greater than a coefficient threshold are BE type applications, and the other applications are LC type applications.
本申请实施例中,确定应用类型的依据有多种情况,上述情况1)或情况2)对应用区分器的区分功能要求并不是很高,允许一定的容错率。而情况3)相对较为严格,有利于明确区分LC型应用和BE型应用,提升了系统的可靠性。In the embodiment of the present application, there are multiple cases for determining the application type. The above case 1) or case 2) does not require a very high distinguishing function of the application distinguisher, allowing a certain error tolerance. Case 3) is relatively strict, which is conducive to clearly distinguishing LC type applications from BE type applications, and improves the reliability of the system.
1106.向资源调度器发送第一应用标识和第二应用标识,以使资源调度器区分BE型应用和LC型应用。1106. Send the first application identifier and the second application identifier to the resource scheduler so that the resource scheduler can distinguish between BE type applications and LC type applications.
在标识出第一应用标识和第二应用标识之后,应用区分器可以向资源调度器发送第一应用标识和第二应用标识,使得资源调度器根据第一应用标识确定LC型应用,根据第二应用标识确定BE型应用。After identifying the first application identifier and the second application identifier, the application distinguisher may send the first application identifier and the second application identifier to the resource scheduler, so that the resource scheduler determines the LC type application according to the first application identifier and determines the BE type application according to the second application identifier.
需要注意的是,本申请实施例中,区间系数阈值、系数阈值、差值阈值均可以根据实际应用的需要确定,具体此处不做限定。It should be noted that in the embodiments of the present application, the interval coefficient threshold, the coefficient threshold, and the difference threshold can all be determined according to the needs of the actual application, and are not specifically limited here.
本申请实施例中,应用区分器能够对计算机系统中多个应用的应用类型进行识别,并告知资源调度器各个应用的应用标识,为资源调度器进行资源调度提供了技术支持,提升了技术方案的可实现性。In an embodiment of the present application, the application distinguisher can identify the application types of multiple applications in a computer system and inform the resource scheduler of the application identifier of each application, providing technical support for the resource scheduler to perform resource scheduling and improving the feasibility of the technical solution.
经过对图11所示实施例的说明,可以得知应用区分器实际上会进行三次判断,并根据判断结果的不同,执行不同的操作。以图11所示实施例中,确定多个应用中差值绝对值大于差值阈值的应用,或者发送带宽比例变异系数大于系数阈值的应用为BE型应用,其他应用为LC型应用为例,对应用识别的方法进行说说明。Through the description of the embodiment shown in FIG11, it can be known that the application distinguisher actually performs three judgments and performs different operations according to different judgment results. Taking the embodiment shown in FIG11, the application whose absolute value of difference is greater than the difference threshold or the application whose transmission bandwidth ratio variation coefficient is greater than the coefficient threshold is determined as a BE type application and the other applications are LC type applications as an example, the method of application identification is described.
请参阅图12,图12为本申请实施例提供的应用识别方法的流程示意图,包括以下步骤:Please refer to FIG. 12 , which is a flow chart of an application identification method provided in an embodiment of the present application, comprising the following steps:
1201.检测目标应用的阶段变化。1201. Detect phase changes of target applications.
应用区分器通过检测目标应用在当前阶段的网络总带宽区间变异系数,来检测目标应用的阶段变化。具体的,与图11所示实施例中的步骤1101和步骤1102类似,具体此处不再赘述。The application distinguisher detects the phase change of the target application by detecting the coefficient of variation of the total network bandwidth interval of the target application in the current phase. Specifically, it is similar to step 1101 and step 1102 in the embodiment shown in FIG11 , and will not be described in detail here.
1202.确定目标应用是否发生阶段性变化,若是,则执行步骤1203,若否,则执行步骤1201。1202. Determine whether the target application undergoes a phased change. If so, execute step 1203; if not, execute step 1201.
如果目标应用在当前阶段的网络总带宽区间变异系数大于区间系数阈值,则认为目标应用发生了阶段性变化;如果目标应用在当前阶段的网络总带宽区间变异系数不大于区间系数阈值,则认为目标应用没有发生阶段性变化。If the interval variation coefficient of the total network bandwidth of the target application in the current stage is greater than the interval coefficient threshold, it is considered that the target application has undergone a phased change; if the interval variation coefficient of the total network bandwidth of the target application in the current stage is not greater than the interval coefficient threshold, it is considered that the target application has not undergone a phased change.
1203.确定发送带宽比例变化是否满足条件,若是,则执行步骤1206,若否,则执行步骤1204。1203. Determine whether the change in the sending bandwidth ratio meets the conditions, if so, execute step 1206, if not, execute step 1204.
在确定目标应用发生了阶段性变化的情况下,应用区分器会进一步确定目标应用的发送带宽比例的变化是否满足条件,即确定目标应用在当前阶段的末尾发送带宽比例与下一阶段的起始发送带宽比例的差值绝对值是否大于差值阈值。若是,则认为满足条件,若否,则认为不满足条件。When it is determined that the target application has undergone a phased change, the application distinguisher will further determine whether the change in the target application's transmission bandwidth ratio meets the condition, that is, whether the absolute value of the difference between the target application's transmission bandwidth ratio at the end of the current phase and the starting transmission bandwidth ratio of the next phase is greater than the difference threshold. If so, the condition is considered to be met, otherwise, the condition is considered not to be met.
1204.确定发送带宽比例变异系数是否大于系数阈值,若是,则执行步骤1206,若否,则执行步骤1205。1204. Determine whether the transmission bandwidth ratio variation coefficient is greater than the coefficient threshold, if so, execute step 1206, if not, execute step 1205.
在发送带宽比例变化不满足条件的情况下,应用区分器会确定在当前应用阶段的前后目标时间段内的发送带宽比例变异系数是否大于系数阈值,并根据结果不同,执行不同的操作。When the change in the sending bandwidth ratio does not meet the condition, the application distinguisher determines whether the sending bandwidth ratio variation coefficient in the target time period before and after the current application stage is greater than the coefficient threshold, and performs different operations according to different results.
1205.确定目标应用为LC型应用,标记第一应用标识。1205. Determine that the target application is an LC type application and mark the first application identifier.
1206.确定目标应用为LC型应用,标记第二应用标识。1206. Determine that the target application is an LC type application and mark the second application identifier.
步骤1205和步骤1206,在图11所示实施例中步骤1105中进行了说明,此处不再赘述。Step 1205 and step 1206 are described in step 1105 in the embodiment shown in FIG. 11 , and will not be described again here.
本申请实施例还提供了一种资源调度系统,请参阅图13a,图13a为本申请实施例提供的资源调度系统的一个结构示意图。其中,图13a中的App1~AppN表示N个应用(application)。The embodiment of the present application further provides a resource scheduling system, please refer to Figure 13a, which is a schematic diagram of the structure of the resource scheduling system provided by the embodiment of the present application. App1 to AppN in Figure 13a represent N applications.
如图13a所示,资源调度系统包括资源调度器,资源调度器用于:获取计算机系统中包括的至少一个延迟敏感LC型应用中每个LC型应用的剩余干扰容忍度。从多个LC型应用中,获取剩余干扰容忍度最小的第一LC型应用。若第一LC型应用的剩余干扰容忍度小于容忍度下限,则增加第一LC型应用的第一隔离区资源。若第一LC型应用的剩余干扰容忍度大于容忍度上限,则将多个LC型应用中的第二LC型应用的第二隔离区资源转移至资源共享区。As shown in FIG13a, the resource scheduling system includes a resource scheduler, and the resource scheduler is used to: obtain the residual interference tolerance of each LC type application in at least one delay-sensitive LC type application included in the computer system. From multiple LC type applications, obtain a first LC type application with the smallest residual interference tolerance. If the residual interference tolerance of the first LC type application is less than the lower limit of the tolerance, increase the first isolation area resources of the first LC type application. If the residual interference tolerance of the first LC type application is greater than the upper limit of the tolerance, transfer the second isolation area resources of the second LC type application in the multiple LC type applications to the resource sharing area.
资源调度器还可以执行前述图1a至图8所示实施例中资源调度器所执行的操作,此处不再赘述。The resource scheduler may also perform the operations performed by the resource scheduler in the embodiments shown in the aforementioned FIG. 1a to FIG. 8 , which will not be described in detail here.
在一些可选实施例中,资源调度系统还包括服务质量预测器,服务质量预测器用于:获取目标LC型应用对应的多个网络接收队列长度。计算多个网络接收队列长度的均值,得到平均网络接收队列长度。若平均网络接收队列长度大于长度阈值,则向资源调度器发送第一调度信息,第一调度信息指示增加目标LC型应用的隔离区资源。若平均网络接收队列长度小于或等于长度阈值,且多个网络接收队列长度中取值为0的网络接收队列长度在多个网络接收队列长度中的占比大于比例阈值,则向资源调度器发送第二调度信息,第二调度信息指示减少目标LC型应用的隔离区资源。In some optional embodiments, the resource scheduling system further includes a service quality predictor, which is used to: obtain multiple network receive queue lengths corresponding to the target LC type application. Calculate the average of the multiple network receive queue lengths to obtain the average network receive queue length. If the average network receive queue length is greater than the length threshold, send first scheduling information to the resource scheduler, and the first scheduling information indicates to increase the isolation area resources of the target LC type application. If the average network receive queue length is less than or equal to the length threshold, and the proportion of the network receive queue length with a value of 0 in the multiple network receive queue lengths is greater than the proportion threshold, send second scheduling information to the resource scheduler, and the second scheduling information indicates to reduce the isolation area resources of the target LC type application.
服务质量预测器还可以执行前述图9所示实施例中服务质量预测器所执行的操作,此处不再赘述。The service quality predictor may also perform the operations performed by the service quality predictor in the embodiment shown in FIG. 9 , which will not be described in detail here.
在一些可选实施例中,资源调度系统还包括应用区分器,应用区分器用于:获取计算机系统中多个应用在当前阶段的多个网络总带宽平均值。根据多个网络总带宽平均值,确定多个应用在当前阶段的多个网络总带宽区间变异系数。若多个网络总带宽区间变异系数大于区间系数阈值,则获取多个应用在当前阶段的多个末尾发送/接收带宽比例与下一阶段的多个起始发送/接收带宽比例的多个差值绝对值。获取多个应用在当前阶段的前后目标时间段内的多个发送/接收带宽比例变异系数。根据多个差值绝对值或多个发送/接收带宽比例变异系数,确定多个应用中LC型应用的标识为第一应用标识,多个应用中BE型应用的标识为第二应用标识。向资源调度器发送第一应用标识和第二应用标识,以使资源调度器区分BE型应用和LC型应用。In some optional embodiments, the resource scheduling system further includes an application distinguisher, which is used to: obtain multiple average network total bandwidths of multiple applications in the computer system at the current stage. According to the multiple average network total bandwidths, determine the multiple interval variation coefficients of the multiple network total bandwidths of the multiple applications at the current stage. If the multiple interval variation coefficients of the multiple network total bandwidths are greater than the interval coefficient threshold, obtain multiple absolute values of the difference between the multiple end send/receive bandwidth ratios of the multiple applications at the current stage and the multiple start send/receive bandwidth ratios of the next stage. Obtain multiple coefficients of variation of the send/receive bandwidth ratios of the multiple applications in the target time periods before and after the current stage. According to the multiple absolute values of the difference or the multiple coefficients of variation of the send/receive bandwidth ratios, determine that the identifier of the LC type application in the multiple applications is the first application identifier, and the identifier of the BE type application in the multiple applications is the second application identifier. Send the first application identifier and the second application identifier to the resource scheduler so that the resource scheduler distinguishes between the BE type application and the LC type application.
应用区分器还可以执行前述图11至图12所示实施例中应用区分器所执行的操作,此处不再赘述The application distinguisher can also perform the operations performed by the application distinguisher in the embodiments shown in the above-mentioned Figures 11 to 12, which will not be repeated here.
在一些可选的实施例中,资源调度系统还包括计算设备,计算设备用于向资源调度器发送当前系统熵、或每个LC型应用能容忍的干扰量、或每个LC型应用实际受到的干扰量、或每个BE型应用单独运行时的第一每周期指令数、或每个BE型应用受干扰后的第二每周期指令数中的至少一项。In some optional embodiments, the resource scheduling system also includes a computing device, which is used to send to the resource scheduler at least one of the current system entropy, or the amount of interference that each LC-type application can tolerate, or the amount of interference actually received by each LC-type application, or the first number of instructions per cycle when each BE-type application runs alone, or the second number of instructions per cycle after each BE-type application is interfered.
在一些可选的实施例中,计算设备还用于获取目标LC型应用对应的多个网络接收队列长度。计算多个网络接收队列长度的均值,得到平均网络接收队列长度。向服务质量预测器发送平均网络接收队列长度。In some optional embodiments, the computing device is further configured to obtain a plurality of network receiving queue lengths corresponding to the target LC type application, calculate the average of the plurality of network receiving queue lengths to obtain an average network receiving queue length, and send the average network receiving queue length to the service quality predictor.
在一些可选的实施例中,计算设备还用于向应用区分器发送计算机系统中多个应用当前阶段的多个网络总带宽的平均值,或者多个应用在当前阶段的多个网络总带宽区间变异系数,或者多个应用在当前阶段的多个末尾发送/接收带宽比例与下一阶段的多个起始发送/接收带宽比例的多个差值绝对值,或者多个应用在当前阶段的前后目标时间段内的多个发送/接收带宽比例变异系数中的至少一项。In some optional embodiments, the computing device is also used to send to the application distinguisher an average value of multiple network total bandwidths of multiple applications in the computer system at the current stage, or multiple network total bandwidth interval variation coefficients of multiple applications at the current stage, or multiple absolute values of differences between multiple end send/receive bandwidth ratios of multiple applications at the current stage and multiple start send/receive bandwidth ratios of the next stage, or at least one of multiple send/receive bandwidth ratio variation coefficients of multiple applications in the target time periods before and after the current stage.
可以理解的是,如果资源调度系统中,不包括单独的计算设备,那么资源调度器中可以包括计算模块,用于计算与当前系统熵相关的数据,以及剩余容忍度等,具体此处不做限定。服务质量预测器可以自行计算确定平均网络接收队列长度。应用区分器可以自行计算确定与网络总带宽相关的数据,以及与各个应用相关的发送/接收带宽数据,具体此处不做限定。It is understandable that if the resource scheduling system does not include a separate computing device, the resource scheduler may include a computing module for calculating data related to the current system entropy, and the remaining tolerance, etc., which are not specifically limited here. The service quality predictor can calculate and determine the average network receive queue length by itself. The application distinguisher can calculate and determine data related to the total network bandwidth and the send/receive bandwidth data related to each application by itself, which are not specifically limited here.
本申请实施例提供的资源调度系统,可以应用在“黑盒场景”下,请参阅图13b,图13b为本申请实施例提供的资源调度系统的另一个结构示意图。The resource scheduling system provided in the embodiment of the present application can be applied in a "black box scenario", please refer to Figure 13b, which is another structural diagram of the resource scheduling system provided in the embodiment of the present application.
在公有云等场景下,由于对用户数据安全和隐私的要求,云服务提供商通常不能获取运行在虚拟机内部的应用信息(如应用名称,应用类型,应用的尾延迟等),只能收集虚拟机使用到的物理资源的状态和操作系统级的系统开销等数据,这种场景被称为“黑盒场景”。In scenarios such as public clouds, due to requirements for user data security and privacy, cloud service providers are usually unable to obtain application information running inside virtual machines (such as application name, application type, application tail latency, etc.). They can only collect data such as the status of physical resources used by the virtual machines and system overhead at the operating system level. This scenario is called a "black box scenario."
图13b中,各个器件的功能与图13a中各个器件的功能类似,此处不再赘述。In FIG. 13 b , the functions of the various components are similar to those of the various components in FIG. 13 a , and are not described in detail here.
本申请实施例还提供了一种资源调度器,该资源调度器应用于云计算系统。请参阅图14,图14为本申请实施例提供的资源调度器的一个结构示意图。The embodiment of the present application also provides a resource scheduler, which is applied to a cloud computing system. Please refer to Figure 14, which is a schematic diagram of the structure of the resource scheduler provided in the embodiment of the present application.
如图14所示,资源调度器1400包括获取单元1401和处理单元1402。As shown in FIG. 14 , the resource scheduler 1400 includes an acquisition unit 1401 and a processing unit 1402 .
获取单元1401,用于获取计算机系统中包括的至少一个延迟敏感LC型应用中每个LC型应用的剩余干扰容忍度;从多个LC型应用中,获取剩余干扰容忍度最小的第一LC型应用。The acquisition unit 1401 is used to acquire the residual interference tolerance of each LC type application in at least one delay-sensitive LC type application included in the computer system; and acquire the first LC type application with the smallest residual interference tolerance from multiple LC type applications.
处理单元1402,用于:The processing unit 1402 is used to:
若第一LC型应用的剩余干扰容忍度小于容忍度下限,则增加第一LC型应用的第一隔离区资源。If the remaining interference tolerance of the first LC type application is less than the tolerance lower limit, the first isolation area resources of the first LC type application are increased.
若第一LC型应用的剩余干扰容忍度大于容忍度上限,则将多个LC型应用中的第二LC型应用的第二隔离区资源转移至资源共享区。If the remaining interference tolerance of the first LC-type application is greater than the tolerance upper limit, the second isolation area resources of the second LC-type application among the multiple LC-type applications are transferred to the resource sharing area.
在一些可选的实施例中,获取单元1401还用于获取计算机系统的当前系统熵,当前系统熵用于指示当前计算机系统中应用之间的干扰程度。In some optional embodiments, the acquisition unit 1401 is further used to acquire the current system entropy of the computer system, where the current system entropy is used to indicate the degree of interference between applications in the current computer system.
处理单元1402,具体用于:The processing unit 1402 is specifically configured to:
若当前系统熵大于或等于系统熵阈值,且第一LC型应用的剩余干扰容忍度小于容忍度下限,则增加第一隔离区资源。If the current system entropy is greater than or equal to the system entropy threshold, and the remaining interference tolerance of the first LC type application is less than the tolerance lower limit, then the resources of the first isolation area are increased.
若当前系统熵大于或等于系统熵阈值,且第一LC型应用的剩余干扰容忍度大于容忍度上限,则将第二LC型应用的第二隔离区资源转移至资源共享区。If the current system entropy is greater than or equal to the system entropy threshold, and the remaining interference tolerance of the first LC type application is greater than the tolerance upper limit, the second isolation area resources of the second LC type application are transferred to the resource sharing area.
在一些可选的实施例中,处理单元1402,具体用于:In some optional embodiments, the processing unit 1402 is specifically configured to:
若计算机系统中存在剩余干扰容忍度大于容忍度上限,且具有可剥离的第三隔离区资源的第三LC型应用,则将第三隔离区资源中的资源转移至第一LC型应用对应的第一隔离区,以增加第一隔离区资源;If there is a third LC type application in the computer system whose residual interference tolerance is greater than the tolerance upper limit and which has a third isolation area resource that can be stripped, then the resources in the third isolation area resource are transferred to the first isolation area corresponding to the first LC type application to increase the first isolation area resources;
若计算机系统中不存在第三LC型应用,则将资源共享区的资源转移至第一隔离区,以增加第一隔离区资源。If the third LC type application does not exist in the computer system, the resources of the resource sharing area are transferred to the first isolation area to increase the resources of the first isolation area.
在一些可选的实施例中,获取单元1401,具体用于:In some optional embodiments, the acquiring unit 1401 is specifically configured to:
获取每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量。Obtain the amount of interference that each LC type application can tolerate and the amount of interference each LC type application actually receives.
根据每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量,确定当前系统熵。The current system entropy is determined based on the amount of interference that each LC type application can tolerate and the amount of interference that each LC type application actually receives.
在一些可选的实施例中,计算机系统还包括至少一个尽力而为BE型应用。In some optional embodiments, the computer system further includes at least one best effort BE type application.
获取单元1401,还用于获取来自于应用区分器或来自于用户的第一应用标识和第二应用标识,第一应用标识用于指示LC型应用,第二应用标识用于指示BE型应用。The acquisition unit 1401 is further used to acquire a first application identifier and a second application identifier from an application distinguisher or from a user, wherein the first application identifier is used to indicate an LC type application, and the second application identifier is used to indicate a BE type application.
处理单元1402,还用于根据第一应用标识,确定计算机系统中的LC型应用;根据第二应用标识,确定计算机系统中的BE型应用。The processing unit 1402 is further configured to determine an LC type application in the computer system according to the first application identifier; and to determine a BE type application in the computer system according to the second application identifier.
获取单元1401,具体用于:The acquisition unit 1401 is specifically used for:
获取每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量。Obtain the amount of interference that each LC type application can tolerate and the amount of interference each LC type application actually receives.
根据每个LC型应用能容忍的干扰量和每个LC型应用实际受到的干扰量,确定多个LC型应用的熵。The entropies of the plurality of LC type applications are determined according to the amount of interference that each LC type application can tolerate and the amount of interference that each LC type application actually receives.
获取至少一个BE型应用中每个BE型应用单独运行时的第一每周期指令数和每个BE型应用受干扰后的第二每周期指令数。A first number of instructions per cycle when each BE type application in at least one BE type application runs independently and a second number of instructions per cycle after each BE type application is disturbed are obtained.
根据第一每周期指令数和第二每周期指令数,确定至少一个BE型应用的熵。An entropy of at least one BE-type application is determined based on the first number of instructions per cycle and the second number of instructions per cycle.
根据多个LC型应用的熵和至少一个BE型应用的熵,确定当前系统熵。A current system entropy is determined based on the entropies of the plurality of LC-type applications and the entropy of at least one BE-type application.
在一些可选的实施例中,获取单元1401,还用于获取来自于服务质量预测器的第一调度信息或第二调度信息,第一调度信息指示增加多个LC型应用中目标LC型应用的隔离区资源,第二调度信息指示减少目标LC型应用的隔离区资源。In some optional embodiments, the acquisition unit 1401 is also used to obtain first scheduling information or second scheduling information from a service quality predictor, the first scheduling information indicates increasing the isolation area resources of a target LC type application among multiple LC type applications, and the second scheduling information indicates reducing the isolation area resources of the target LC type application.
处理单元1402,还用于根据第一调度信息,增加目标LC型应用的隔离区资源;或者,根据第二调度信息,减少目标LC型应用的隔离区资源。The processing unit 1402 is further configured to increase the isolation area resources of the target LC type application according to the first scheduling information; or to reduce the isolation area resources of the target LC type application according to the second scheduling information.
在一些可选的实施例中,获取单元1401,还用于获取计算机系统资源调度后的系统熵。In some optional embodiments, the acquisition unit 1401 is further used to acquire system entropy after computer system resources are scheduled.
处理单元1402,还用于若资源调度后的系统熵小于当前系统熵,则确定资源调度成功。The processing unit 1402 is further configured to determine that the resource scheduling is successful if the system entropy after the resource scheduling is less than the current system entropy.
资源调度器1400可以执行前述图1a至图8、以及图13a和图13b所示实施例中资源调度器所执行的操作,此处不再赘述。The resource scheduler 1400 can execute the operations performed by the resource scheduler in the embodiments shown in the aforementioned Figures 1a to 8, and Figures 13a and 13b, which will not be repeated here.
本申请实施例还提供了一种服务质量预测器,该服务质量预测器应用于云计算系统。请参阅图15,图15为本申请实施例提供的服务质量预测器的一个结构示意图。The embodiment of the present application also provides a service quality predictor, which is applied to a cloud computing system. Please refer to Figure 15, which is a schematic diagram of the structure of the service quality predictor provided by the embodiment of the present application.
如图15所示,服务质量预测器包括获取单元1501、处理单元1502和发送单元1503。As shown in FIG. 15 , the service quality predictor includes an acquisition unit 1501 , a processing unit 1502 and a sending unit 1503 .
获取单元1501,用于获取目标LC型应用对应的多个网络接收队列长度。The acquisition unit 1501 is used to acquire multiple network receiving queue lengths corresponding to the target LC type application.
处理单元1502,用于计算多个网络接收队列长度的均值,得到平均网络接收队列长度;The processing unit 1502 is used to calculate the average of the lengths of multiple network receiving queues to obtain an average network receiving queue length;
发送单元1503,用于:The sending unit 1503 is used to:
若平均网络接收队列长度大于长度阈值,则向资源调度器发送第一调度信息,第一调度信息指示增加目标LC型应用的隔离区资源;If the average network receiving queue length is greater than the length threshold, first scheduling information is sent to the resource scheduler, where the first scheduling information indicates to increase the isolation area resources of the target LC type application;
若平均网络接收队列长度小于或等于长度阈值,且多个网络接收队列长度中取值为0的网络接收队列长度在多个网络接收队列长度中的占比大于比例阈值,则向资源调度器发送第二调度信息,第二调度信息指示减少目标LC型应用的隔离区资源。If the average network receive queue length is less than or equal to the length threshold, and the proportion of network receive queue lengths with a value of 0 among multiple network receive queue lengths is greater than the proportion threshold, then a second scheduling information is sent to the resource scheduler, and the second scheduling information indicates to reduce the isolation area resources of the target LC type application.
服务质量预测器1500可以执行前述图9、图13a和图13b所示实施例中服务质量预测器所执行的操作,此处不再赘述。The service quality predictor 1500 can execute the operations executed by the service quality predictor in the embodiments shown in the aforementioned FIG. 9 , FIG. 13 a and FIG. 13 b , which will not be described in detail here.
本申请实施例还提供了一种应用区分器,请参阅图16,图16为本申请实施例提供的应用区分器的一个结构示意图。The embodiment of the present application also provides an application distinguisher. Please refer to Figure 16, which is a structural diagram of the application distinguisher provided in the embodiment of the present application.
如图16所示,服务质量预测器包括获取单元1601、处理单元1602和发送单元1603。As shown in FIG. 16 , the service quality predictor includes an acquisition unit 1601 , a processing unit 1602 and a sending unit 1603 .
获取单元1601,用于获取计算机系统中多个应用在当前阶段的多个网络总带宽平均值。The acquisition unit 1601 is used to acquire the average values of the total network bandwidths of multiple applications in the computer system at the current stage.
处理单元1602,用于根据多个网络总带宽平均值,确定多个应用在当前阶段的多个网络总带宽区间变异系数。The processing unit 1602 is used to determine multiple interval variation coefficients of the total network bandwidths of multiple applications at the current stage according to the average values of the multiple total network bandwidths.
获取单元1601,还用于若多个网络总带宽区间变异系数大于区间系数阈值,则获取多个应用在当前阶段的多个末尾发送/接收带宽比例与下一阶段的多个起始发送/接收带宽比例的多个差值绝对值。获取多个应用在当前阶段的前后目标时间段内的多个发送/接收带宽比例变异系数。The acquisition unit 1601 is further configured to acquire, if the multiple network total bandwidth interval variation coefficients are greater than the interval coefficient threshold, multiple absolute values of differences between multiple end send/receive bandwidth ratios of multiple applications in the current phase and multiple start send/receive bandwidth ratios of multiple applications in the next phase. Multiple send/receive bandwidth ratio variation coefficients of multiple applications in the target time periods before and after the current phase.
处理单元1602,还用于根据多个差值绝对值或多个发送/接收带宽比例变异系数,确定多个应用中LC型应用的标识为第一应用标识,多个应用中BE型应用的标识为第二应用标识。The processing unit 1602 is further configured to determine, based on multiple difference absolute values or multiple transmission/reception bandwidth ratio variation coefficients, an identifier of an LC type application among multiple applications as a first application identifier and an identifier of a BE type application among multiple applications as a second application identifier.
发送单元1603,用于向资源调度器发送第一应用标识和第二应用标识,以使资源调度器区分BE型应用和LC型应用。The sending unit 1603 is used to send the first application identifier and the second application identifier to the resource scheduler, so that the resource scheduler can distinguish between the BE type application and the LC type application.
在一些可选的实施例中,处理单元1602,具体用于:In some optional embodiments, the processing unit 1602 is specifically configured to:
从多个应用中,确定差值绝对值大于差值阈值,和/或,发送/接收带宽比例变异系数大于系数阈值的应用,为BE型应用。From among multiple applications, applications whose absolute values of differences are greater than a difference threshold and/or whose coefficient of variation of the transmission/reception bandwidth ratio is greater than a coefficient threshold are determined to be BE type applications.
标记BE型应用的标识为第二应用标识。The identifier for marking the BE type application is the second application identifier.
标记多个应用中BE型应用之外的应用的标识为第一应用标识。An identifier of an application other than the BE type application among the multiple applications is marked as a first application identifier.
应用区分器1600可以执行前述图11至图13b所示实施例中应用区分器所执行的操作,此处不再赘述。The application distinguisher 1600 can perform the operations performed by the application distinguisher in the embodiments shown in the aforementioned Figures 11 to 13b, which will not be repeated here.
下面,对本申请实施例提供的计算机设备进行说明,请参阅图17,图17为本申请实施例提供的计算机设备的一个结构示意图。该计算机设备1700包括:处理器1701和存储器1702,存储器1702中存储有一个或一个以上的应用程序或数据。Next, the computer device provided in the embodiment of the present application is described, and please refer to Figure 17, which is a schematic diagram of the structure of the computer device provided in the embodiment of the present application. The computer device 1700 includes: a processor 1701 and a memory 1702, and the memory 1702 stores one or more application programs or data.
其中,存储器1702可以是易失性存储或持久存储。存储在存储器1702的程序可以包括一个或一个以上模块,每个模块可以用于执行计算机设备1700所执行的一系列操作。更进一步地,处理器1701可以与存储器1702通信,在计算机设备1700上执行存储器1702中的一系列指令操作。处理器1701可以是中央处理器(central processing units,CPU),也可以是单核处理器,除此之外,还可以是其他类型的处理器,例如双核处理器,具体此处不做限定。Among them, the memory 1702 can be a volatile storage or a persistent storage. The program stored in the memory 1702 can include one or more modules, each of which can be used to execute a series of operations performed by the computer device 1700. Furthermore, the processor 1701 can communicate with the memory 1702 and execute a series of instruction operations in the memory 1702 on the computer device 1700. The processor 1701 can be a central processing unit (CPU) or a single-core processor. In addition, it can also be other types of processors, such as a dual-core processor, which is not limited here.
计算机设备1700还可以包括一个或一个以上通信接口1703,一个或一个以上操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等。The computer device 1700 may further include one or more communication interfaces 1703 , one or more operating systems, such as Windows Server ™ , Mac OS X ™ , Unix ™ , Linux ™ , FreeBSD ™ , etc.
在一些可选的实施例中,计算机设备还可以包括末级高度缓存(last levelcache,LLC)和主存储器,具体此处不做限定。In some optional embodiments, the computer device may further include a last level cache (LLC) and a main memory, which are not specifically limited here.
该计算机设备1700作为资源调度器时,可以执行前述图1a至图8、图13a和图13b所示实施例中资源调度器所执行的操作;该计算机设备1700作为服务质量预测器时,可以执行前述图1a至图9、图13a和图13b所示实施例中为服务质量预测器所执行的操作;该计算机设备1700作为应用区分器时,可以执行前述图1a至图8、图11至图13b所示实施例中应用区分器所执行的操作;此处不再赘述。When the computer device 1700 is used as a resource scheduler, it can execute the operations performed by the resource scheduler in the embodiments shown in Figures 1a to 8, 13a and 13b; when the computer device 1700 is used as a service quality predictor, it can execute the operations performed by the service quality predictor in the embodiments shown in Figures 1a to 9, 13a and 13b; when the computer device 1700 is used as an application differentiator, it can execute the operations performed by the application differentiator in the embodiments shown in Figures 1a to 8, 11 to 13b; no further details will be given here.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or units, which can be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions to enable a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk and other media that can store program code.
Claims (28)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210303023.4A CN116846978A (en) | 2022-03-25 | 2022-03-25 | Resource scheduling method, application identification method and related equipment of cloud computing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210303023.4A CN116846978A (en) | 2022-03-25 | 2022-03-25 | Resource scheduling method, application identification method and related equipment of cloud computing system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116846978A true CN116846978A (en) | 2023-10-03 |
Family
ID=88160455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210303023.4A Pending CN116846978A (en) | 2022-03-25 | 2022-03-25 | Resource scheduling method, application identification method and related equipment of cloud computing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116846978A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119938344A (en) * | 2025-04-09 | 2025-05-06 | 中国科学院计算技术研究所 | Data center resource scheduling method, device, storage medium and electronic equipment |
CN119961008A (en) * | 2025-04-09 | 2025-05-09 | 中国科学院计算技术研究所 | Resource scheduling method and device for data center based on application resource sensitivity |
CN119961007A (en) * | 2025-04-09 | 2025-05-09 | 中国科学院计算技术研究所 | Data center resource scheduling method and device |
-
2022
- 2022-03-25 CN CN202210303023.4A patent/CN116846978A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119938344A (en) * | 2025-04-09 | 2025-05-06 | 中国科学院计算技术研究所 | Data center resource scheduling method, device, storage medium and electronic equipment |
CN119961008A (en) * | 2025-04-09 | 2025-05-09 | 中国科学院计算技术研究所 | Resource scheduling method and device for data center based on application resource sensitivity |
CN119961007A (en) * | 2025-04-09 | 2025-05-09 | 中国科学院计算技术研究所 | Data center resource scheduling method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116846978A (en) | Resource scheduling method, application identification method and related equipment of cloud computing system | |
Ramakrishnan | Performance considerations in designing network interfaces | |
EP3624400B1 (en) | Technologies for deploying virtual machines in a virtual network function infrastructure | |
Almeida et al. | Providing differentiated levels of service in web content hosting | |
US20020112102A1 (en) | Computer forming logical partitions | |
CN112162865A (en) | Server scheduling method and device and server | |
US20170061566A1 (en) | Technologies for offloading network packet processing to a gpu | |
US9197566B2 (en) | Information processing method, recording medium, and information processing apparatus | |
Wu et al. | The performance analysis of Linux networking–packet receiving | |
US11438271B2 (en) | Method, electronic device and computer program product of load balancing | |
CN109614228B (en) | Comprehensive monitoring front-end system based on dynamic load balancing mode and working method | |
CN117076133B (en) | Cloud game platform heterogeneous resource allocation method, computer device and storage medium | |
CN112769905B (en) | NUMA (non uniform memory access) architecture based high-performance network card performance optimization method under Feiteng platform | |
CN114564313A (en) | Load adjustment method and device, electronic equipment and storage medium | |
CN109039933B (en) | A cluster network optimization method, device, equipment and medium | |
JP2011203810A (en) | Server, computer system, and virtual computer management method | |
US8780723B2 (en) | Communication system and communication apparatus | |
KR20120121146A (en) | Method and Apparatus for resource allocation in virtual network environment | |
CN114675972B (en) | Cloud network resource flexible scheduling method and system based on integral algorithm | |
CN115378885B (en) | Virtual machine service network bandwidth management method and device under super fusion architecture | |
Zhang et al. | Understanding and Enhancing Linux Kernel-based Packet Switching on WiFi Access Points | |
Soryani et al. | Improving inter-node communications in multi-core clusters using a contention-free process mapping algorithm | |
CN112994908B (en) | Network slice message transmission method, electronic equipment and storage medium | |
KR20130075377A (en) | Apparatus and method for allocating processor in virtualization environment | |
CN109062707B (en) | Electronic device, method for limiting inter-process communication thereof and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |