CN111796932A

CN111796932A - A GPU resource scheduling method

Info

Publication number: CN111796932A
Application number: CN202010576793.7A
Authority: CN
Inventors: 徐山川; 王滨; 王臣汉
Original assignee: Beijing Computing Tianjin Information Technology Co ltd
Current assignee: Beijing Computing Tianjin Information Technology Co ltd
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2020-10-20

Abstract

The invention relates to the technical field of communication application, and discloses a GPU resource scheduling method, which comprises the following steps: s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2; s2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3; s3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered; s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5; and S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application. The sharing of a single GPU in a plurality of applications according to GPU video memory and GPU computing power percentages is realized, the utilization efficiency of the single GPU is greatly improved, and the cost of GPU application is reduced.

Description

A GPU resource scheduling method

技术领域technical field

本发明涉及通信应用技术领域，具体而言，涉及一种GPU资源调度方法。The present invention relates to the technical field of communication applications, and in particular, to a method for scheduling GPU resources.

背景技术Background technique

随着设备性能的爆炸式的增长，以及虚拟化技术的逐渐普及，如何在已有的物理设备上实现多台虚拟化设备的资源动态分配、灵活调度、提高资源利用率，以满足用户在日常工作中需求迫在眉睫。With the explosive growth of device performance and the gradual popularization of virtualization technology, how to realize dynamic resource allocation, flexible scheduling, and improve resource utilization of multiple virtualized devices on existing physical devices to meet the needs of users in daily The demands at work are imminent.

采用Kubernetes对企业服务器集群进行管理极大的降低了企业的运维成本、提升了资源的利用率，但目前为止Kubernetes对每个机器的资源管理主要是对CPU、内存、存储以及其他硬件的管理。由于目前越来越多的企业采用GPU进行机器学习的模型训练和在线服务，对GPU资源的高效管理显得愈发重要。The use of Kubernetes to manage enterprise server clusters greatly reduces the operation and maintenance costs of enterprises and improves the utilization of resources, but so far, the resource management of Kubernetes for each machine is mainly the management of CPU, memory, storage and other hardware . As more and more enterprises use GPUs for machine learning model training and online services, efficient management of GPU resources is becoming more and more important.

现有技术的缺陷：对GPU资实体GPU卡为单位进行资源分配，无法实现多个应用共享GPU资源，这会导致源是以即使单个应用没有充分使用所分配的计算资源也无法将这种独占的资源分配给其他应用，使得GPU资源无法得到充分利用。Defects of the prior art: The GPU resources are allocated in units of physical GPU cards, and multiple applications cannot share GPU resources, which will lead to the fact that even if a single application does not fully use the allocated computing resources, it cannot be exclusively used. The resources are allocated to other applications, so that the GPU resources cannot be fully utilized.

发明内容SUMMARY OF THE INVENTION

本发明的主要目的在于提供一种GPU资源调度方法，以解决目前的单个应用使GPU资源无法得到充分利用问题。The main purpose of the present invention is to provide a GPU resource scheduling method to solve the problem that the GPU resources cannot be fully utilized by a single application at present.

为了实现上述目的，本发明提供了如下技术：In order to achieve the above object, the present invention provides the following technologies:

一种GPU资源调度方法，包括步骤：A GPU resource scheduling method, comprising the steps of:

S1、首先从集群中收集GPU的基本信息，并提供gpu-usages接口，进入步骤S2；S1. First, collect the basic information of the GPU from the cluster, and provide the gpu-usages interface, and enter step S2;

S2、创建GPU应用，并向Kubernetes调度器发送应用请求，进入步骤S3；S2. Create a GPU application, send an application request to the Kubernetes scheduler, and go to step S3;

S3、Kubernetes调度器在接收到该应用请求后会遍历集群中所有的GPU应用，进入步骤S4；S3. After receiving the application request, the Kubernetes scheduler will traverse all GPU applications in the cluster, and enter step S4;

S4、通过gpu-usages接口计算满足该应用的调度需求的GPU，进入步骤S5；S4. Calculate the GPU that meets the scheduling requirements of the application through the gpu-usages interface, and enter step S5;

S5、GPU管理器按照应用上GPU所在机器将指定的GPU资源绑定到该应用中。S5. The GPU manager binds the specified GPU resources to the application according to the machine where the GPU on the application is located.

进一步地，步骤S2中，创建GPU应用过程中，应用提供其所需要的显存值以及算力值。Further, in step S2, in the process of creating the GPU application, the application provides the required video memory value and computing power value.

进一步地，步骤S1中，收集GPU的基本信息包括GPU的型号、显存和GPU核心。Further, in step S1, the basic information of the collected GPU includes the model of the GPU, the video memory and the GPU core.

进一步地，步骤S4中，若集群中没有满足该应用的调度需求的GPU，则进入步骤S6,S6、GPU资源的隔离。Further, in step S4, if there is no GPU in the cluster that meets the scheduling requirement of the application, then proceed to step S6, S6, isolation of GPU resources.

进一步地，S6包括步骤S60和S61，S60、发现应用程序要求的显存超出了预先设定值，或者大于集群中所有GPU显存值，则返回显存分配失败；S61、对执行线程进行包装，周期性检查该程序对GPU的核心使用率，超过设定的使用核心值，或者大于集群中所有GPU显存值，则把当前执行线程划入等待执行线程中。Further, S6 includes steps S60 and S61. In S60, it is found that the video memory required by the application program exceeds the preset value, or is greater than the video memory value of all GPUs in the cluster, then returns the video memory allocation failure; S61, wraps the execution thread, periodically Check the core usage rate of the GPU by the program. If it exceeds the set core usage value, or is greater than the memory value of all GPUs in the cluster, the current execution thread is assigned to the waiting execution thread.

进一步地，在步骤S2中，创建GPU应用过程中，还应该提供其所需要的GPU的型号和GPU的个数。Further, in step S2, in the process of creating the GPU application, the model of the GPU and the number of GPUs required by the GPU application should also be provided.

进一步地，在步骤S4中，取满足需求的第一个GPU，在应用上标记GPU所在的机器的名字以及GPU在机器中的编号。Further, in step S4, the first GPU that meets the requirements is taken, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.

进一步地，在步骤S4中，通过gpu-usages接口寻找有相应个数空闲GPU的机器，并从中选择空闲个数最少的机器将其名称添加到应用中。Further, in step S4, a machine with a corresponding number of idle GPUs is searched through the gpu-usages interface, and the machine with the least number of idle GPUs is selected to add its name to the application.

进一步地，在步骤S5中，GPU管理器使用穷举法将GPU分配给该应用，完成GPU资源的调度和绑定。Further, in step S5, the GPU manager uses an exhaustive method to allocate the GPU to the application to complete the scheduling and binding of GPU resources.

进一步地，该方法对一个GPU应用或者多个GPU应用完成GPU资源的调度。Further, the method completes the scheduling of GPU resources for one GPU application or multiple GPU applications.

与现有技术相比较，本发明能够带来如下技术效果：Compared with the prior art, the present invention can bring the following technical effects:

1.实现了单个GPU在多个应用中按照GPU显存和GPU算力百分比的共享，大大提升了单个GPU的利用效率，降低了GPU应用的成本。1. Realize the sharing of a single GPU in multiple applications according to the percentage of GPU video memory and GPU computing power, which greatly improves the utilization efficiency of a single GPU and reduces the cost of GPU applications.

2.在对多个GPU进行调度时考虑的GPU之间的拓扑结构，最大化同一个应用下多个GPU之间的通讯效率，提升了该应用对GPU的使用性能。2. The topology structure between GPUs is considered when scheduling multiple GPUs, which maximizes the communication efficiency between multiple GPUs under the same application, and improves the performance of the application on GPUs.

3.在Kubernetes集群中对GPU应用进行调度时支持资源集中式分配，即支持尽量使用GPU应用多的机器，保证在后续有多卡应用时依然可以成功被调度到集群中。3. Support centralized allocation of resources when scheduling GPU applications in a Kubernetes cluster, that is, support the use of machines with as many GPU applications as possible, to ensure that subsequent multi-card applications can still be successfully scheduled to the cluster.

附图说明Description of drawings

构成本发明的一部分的附图用来提供对本发明的进一步理解，使得本发明的其它特征、目的和优点变得更明显。本发明的示意性实施例附图及其说明用于解释本发明，并不构成对本发明的不当限定。在附图中：The accompanying drawings, which form a part hereof, are used to provide a further understanding of the invention, and to make other features, objects and advantages of the invention more apparent. The accompanying drawings and descriptions of the exemplary embodiments of the present invention are used to explain the present invention, and do not constitute an improper limitation of the present invention. In the attached image:

图1是本发明一种GPU资源调度方法的整体流程图；Fig. 1 is the overall flow chart of a kind of GPU resource scheduling method of the present invention;

图2是现有技术中默认的调度策略与本发明单个实体GPU的共享的流程图；Fig. 2 is the flow chart of the sharing of the default scheduling strategy in the prior art and the single entity GPU of the present invention;

图3是本发明实施例中DGX1的拓扑结构示意图；Fig. 3 is the topological structure schematic diagram of DGX1 in the embodiment of the present invention;

图4是现有技术中有未考虑GPU间的拓扑结构与本发明考虑了GPU的拓扑关系的多GPU分配图；Fig. 4 is a multi-GPU allocation diagram that has not considered the topology structure between GPUs and the present invention has considered the topology relationship of GPUs in the prior art;

图5现有技术中默认的均匀调度策略与本发明集中调度策略的多个GPU应用调度流程图；5 is a flow chart of multiple GPU application scheduling of the default uniform scheduling strategy in the prior art and the centralized scheduling strategy of the present invention;

图6本发明实施例中GPU的拓扑结构示例的示意图。FIG. 6 is a schematic diagram of an example of a topology structure of a GPU in an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to make those skilled in the art better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only Embodiments are part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second" and the like in the description and claims of the present invention and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances for the embodiments of the invention described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

在本发明中，术语“上”、“下”、“左”、“右”、“前”、“后”、“顶”、“底”、“内”、“外”、“中”、“竖直”、“水平”、“横向”、“纵向”等指示的方位或位置关系为基于附图所示的方位或位置关系。这些术语主要是为了更好地描述本发明及其实施例，并非用于限定所指示的装置、元件或组成部分必须具有特定方位，或以特定方位进行构造和操作。In the present invention, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", The orientation or positional relationship indicated by "vertical", "horizontal", "horizontal", "longitudinal", etc. is based on the orientation or positional relationship shown in the drawings. These terms are primarily used to better describe the invention and its embodiments, and are not intended to limit the fact that the indicated device, element or component must have a particular orientation, or be constructed and operated in a particular orientation.

并且，上述部分术语除了可以用于表示方位或位置关系以外，还可能用于表示其他含义，例如术语“上”在某些情况下也可能用于表示某种依附关系或连接关系。对于本领域普通技术人员而言，可以根据具体情况理解这些术语在本发明中的具体含义。In addition, some of the above-mentioned terms may be used to express other meanings besides orientation or positional relationship. For example, the term "on" may also be used to express a certain attachment or connection relationship in some cases. For those of ordinary skill in the art, the specific meanings of these terms in the present invention can be understood according to specific situations.

另外，术语“多个”的含义应为两个以及两个以上。Additionally, the term "plurality" shall mean two and more.

需要说明的是，在不冲突的情况下，本发明中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。It should be noted that the embodiments of the present invention and the features of the embodiments may be combined with each other under the condition of no conflict. The present invention will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

实施例1Example 1

如图1和2所示，对于只需要一个GPU的应用支持按照所需要的GPU显存和所需要的核心个数进行资源分配的方法，而不是将一个完整的GPU全部分配给该应用。默认的GPU资源管理器不支持按照应用所需资源进行分配而是直接锁定整个GPU并分配给所需应用。As shown in Figures 1 and 2, for an application that only needs one GPU, a method of resource allocation is supported according to the required GPU memory and the required number of cores, rather than allocating a complete GPU to the application. The default GPU resource manager does not support allocation according to the resources required by the application, but directly locks the entire GPU and allocates it to the required application.

S1、首先从集群中收集GPU的基本信息，并提供gpu-usages接口，进入步骤S2；步骤S1中，收集GPU的基本信息包括GPU的型号、显存和GPU核心。便于调度器的获取集群GPU资源信息。S1. First, basic information of the GPU is collected from the cluster, and a gpu-usages interface is provided, and the process proceeds to step S2; in step S1, the basic information of the GPU is collected, including the model of the GPU, the video memory, and the GPU core. It is convenient for the scheduler to obtain cluster GPU resource information.

S2、创建GPU应用，并向Kubernetes调度器发送应用请求，进入步骤S3；步骤S2中，创建GPU应用过程中，应用提供其所需要的显存值以及算力值。由于各个GPU的核心个数差异大且不被应用开发人员所了解，这里直接换算成核心个数的比例。例如一个GPU应用需要类似如下的信息给集群：型号为T4类型、4GB显存、25％核心数的GPU资源。S2. Create a GPU application, send an application request to the Kubernetes scheduler, and enter step S3; in step S2, in the process of creating a GPU application, the application provides the video memory value and computing power value required by the application. Since the number of cores of each GPU varies greatly and is not understood by application developers, it is directly converted to the ratio of the number of cores here. For example, a GPU application needs information similar to the following to the cluster: GPU resources with a model of T4 type, 4GB of video memory, and 25% of the number of cores.

S4、通过gpu-usages接口计算满足该应用的调度需求的GPU，进入步骤S5；步骤S4中，若集群中没有满足该应用的调度需求的GPU，则进入步骤S6,GPU资源的隔离。在步骤S4中，取满足需求的第一个GPU，在应用上标记GPU所在的机器的名字以及GPU在机器中的编号。S4. Calculate the GPU that meets the scheduling requirement of the application through the gpu-usages interface, and go to step S5; in step S4, if there is no GPU that meets the scheduling requirement of the application in the cluster, go to step S6 to isolate GPU resources. In step S4, the first GPU that meets the requirements is selected, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.

进一步地，S6包括步骤S60和S61，S60、发现应用程序要求的显存超出了预先设定值，或者大于集群中所有GPU显存值，则返回显存分配失败；S61、对执行线程进行包装，周期性检查该程序对GPU的核心使用率，超过设定的GPU使用核心值，或者大于集群中所有GPU显存值，则把当前执行线程划入等待执行线程中。在完成GPU的共享调度后，GPU管理器按照GPU应用对GPU显存和GPU核心对应用进行了分配，但如果没有相应的资源隔离机制则无法保证应用使用超出约定的GPU资源导致其他应用无法正常使用。Further, S6 includes steps S60 and S61. In S60, it is found that the video memory required by the application program exceeds the preset value, or is greater than the video memory value of all GPUs in the cluster, then returns the video memory allocation failure; S61, wraps the execution thread, periodically Check the GPU core usage rate of the program. If it exceeds the set GPU core usage value, or is greater than the memory value of all GPUs in the cluster, the current execution thread is assigned to the waiting execution thread. After the shared scheduling of GPUs is completed, the GPU manager allocates GPU memory and GPU cores to applications according to the GPU application. However, if there is no corresponding resource isolation mechanism, it cannot guarantee that the application uses the GPU resources beyond the contract and other applications cannot be used normally. .

进一步地，该方法可以实现对一个GPU应用或者多个GPU应用完成GPU资源的调度。Further, the method can realize the scheduling of GPU resources for one GPU application or multiple GPU applications.

实施例2Example 2

如图1、3、4、5和6所示；对于需要多个GPU的应用：按照通讯效率最高的GPU组进行分配。GPU在机器中的连接结构不同，GPU之间的通讯速度也会不同。如附图3所示，DGX-1机器中包含8个GPU，其中GPU0与GPU1、GPU2、GPU3、GPU4可以直接通过NVLink的方式连接，其通讯带宽可以达到40GB/s。而GPU0与GPU5、GPU6、GPU7连接则需要通过PCIe Switch以及QPI完成，相比NVLink其通讯效率大打折扣。当向应用分配多个GPU时应考虑所分配的多个GPU间的连接结构，也称为GPU的拓扑结构。通过GPU的驱动可以获取GPU之间的拓扑结构，而通过该拓扑结构可以连接到各个GPU之间的通讯效率。一个GPU拓扑结构的示例见附图6。As shown in Figures 1, 3, 4, 5, and 6; for applications that require multiple GPUs: allocate according to the GPU group with the highest communication efficiency. The connection structure of GPUs in the machine is different, and the communication speed between GPUs will also be different. As shown in Figure 3, the DGX-1 machine contains 8 GPUs, of which GPU0, GPU1, GPU2, GPU3, and GPU4 can be directly connected through NVLink, and the communication bandwidth can reach 40GB/s. The connection between GPU0 and GPU5, GPU6, and GPU7 needs to be completed through PCIe Switch and QPI, which greatly reduces the communication efficiency compared with NVLink. When allocating multiple GPUs to an application, the connection structure between the multiple GPUs allocated should be considered, also known as the topology of the GPUs. The topology structure between the GPUs can be obtained through the driving of the GPU, and the communication efficiency between the GPUs can be connected through the topology structure. An example of a GPU topology is shown in Figure 6.

支持GPU应用的集中占用方案。默认的Kubernetes资源调度方式是一个资源均匀调度方案，即对于一个集群，其尽量将部署的应用均匀的分布在各个节点上，这样可以最大程度上保证应用的可用性，即当某台机器出现问题时其他机器中的应用不会受到影响。但是这种调度方案对于多GPU应用来说会导致其无法充分利用GPU资源，如附图5的所示，a路径为采用默认的均匀调度策略，GPU资源被均匀使用，在出现新的多GPU应用需求时无法完成调度；b路径采用集中调度策略，尽量将GPU应用调度到比较忙碌的机器，在有多GPU应用时可以完成调度。Supports centralized occupancy schemes for GPU applications. The default Kubernetes resource scheduling method is a resource uniform scheduling scheme, that is, for a cluster, it tries to distribute the deployed applications evenly on each node, so as to ensure the availability of the application to the greatest extent, that is, when a machine has a problem Applications in other machines will not be affected. However, for multi-GPU applications, this scheduling scheme will cause it to fail to make full use of GPU resources. As shown in Figure 5, path a adopts the default uniform scheduling strategy, and GPU resources are used evenly. When a new multi-GPU appears The scheduling cannot be completed when the application demands; the b path adopts a centralized scheduling strategy, and tries to schedule the GPU application to a relatively busy machine, and can complete the scheduling when there are multiple GPU applications.

下面是一次完成的多GPU应用部署的流程：The following is a complete multi-GPU application deployment process:

S2、创建GPU应用，并向Kubernetes调度器发送应用请求，进入步骤S3；步骤S2中，创建GPU应用过程中，应用提供其所需要的显存值以及算力值。由于各个GPU的核心个数差异大且不被应用开发人员所了解，这里直接换算成核心个数的比例。例如一个GPU应用需要类似如下的信息给集群：型号为T4类型、4GB显存、25％核心数的GPU资源。创建GPU应用过程中，还应该提供其所需要的GPU的型号和GPU的个数。如果是一个多GPU应用，那么只需要提供GPU的型号和GPU的个数，例如一个GPU应用需要类似如下的信息给集群：型号为T4类型、2GPU。S2. Create a GPU application, send an application request to the Kubernetes scheduler, and enter step S3; in step S2, in the process of creating a GPU application, the application provides the video memory value and computing power value required by the application. Since the number of cores of each GPU varies greatly and is not understood by application developers, it is directly converted to the ratio of the number of cores here. For example, a GPU application needs information similar to the following to the cluster: GPU resources with a model of T4 type, 4GB of video memory, and 25% of the number of cores. In the process of creating a GPU application, you should also provide the required GPU model and the number of GPUs. If it is a multi-GPU application, you only need to provide the GPU model and the number of GPUs. For example, a GPU application needs information similar to the following to the cluster: the model is T4 type, 2GPU.

通过gpu-usages接口寻找有相应个数空闲GPU的机器，并从中选择空闲个数最少的机器将其名称添加到应用中。如果是一个多GPU应用，需要通过gpu-usages接口寻找有相应个数空闲GPU的机器并从中选择空闲个数最少的机器将其名称添加到应用中。例如该应用需要两个T4类型的GPU，在该步骤分别在机器1和机器2中找到了3个和4个空闲的T4类型GPU，那么则选择机器1作为应用的调度机器，将机器的信息添加到应用中。Find machines with a corresponding number of idle GPUs through the gpu-usages interface, and select the machine with the least number of idle GPUs to add its name to the application. If it is a multi-GPU application, you need to find the machine with the corresponding number of idle GPUs through the gpu-usages interface, and select the machine with the least number of idle GPUs to add its name to the application. For example, the application requires two T4 type GPUs. In this step, 3 and 4 idle T4 type GPUs are found in machine 1 and machine 2 respectively. Then machine 1 is selected as the scheduling machine of the application, and the information of the machine is added to the app.

S5、GPU管理器按照应用上GPU所在机器将指定的GPU资源绑定到该应用中。在步骤S5中，GPU管理器使用穷举法将GPU分配给该应用，完成GPU资源的调度和绑定。S5. The GPU manager binds the specified GPU resources to the application according to the machine where the GPU on the application is located. In step S5, the GPU manager uses an exhaustive method to allocate GPUs to the application to complete the scheduling and binding of GPU resources.

GPU管理器按照应用被分配的机器在相应的机器中使用穷举法找到一组连接效率最高的GPU分配给该应用完成GPU资源的调度和绑定。例如在一个需要两个V100型号GPU的应用，我们将其分配到了一台DGX-1机器中，其中GPU0、GPU1、GPU7是空闲的，我们按照(GPU0、GPU1)，(GPU0、GPU7)，(GPU1、GPU7)的组合进行穷举，选择(GPU0、GPU1)作为最终绑定的GPU。The GPU manager uses an exhaustive method in the corresponding machine to find a set of GPUs with the highest connection efficiency and allocates them to the application to complete the scheduling and binding of GPU resources according to the machine to which the application is allocated. For example, in an application that requires two V100 GPUs, we assign them to a DGX-1 machine, where GPU0, GPU1, and GPU7 are idle. We follow (GPU0, GPU1), (GPU0, GPU7), ( The combination of GPU1, GPU7) is exhausted, and (GPU0, GPU1) is selected as the final bound GPU.

进一步地，S6包括步骤S60和S61，S60、发现应用程序要求的显存超出了预先设定值，或者大于集群中所有GPU显存值，则返回显存分配失败；S61、对执行线程进行包装，周期性检查该程序对GPU的核心使用率，超过设定的使用核心值，或者大于集群中所有GPU显存值，则把当前执行线程划入等待执行线程中。在完成GPU的共享调度后，GPU管理器按照GPU应用对GPU显存和GPU核心对应用进行了分配，但如果没有相应的资源隔离机制则无法保证应用使用超出约定的GPU资源导致其他应用无法正常使用。Further, S6 includes steps S60 and S61. In S60, it is found that the video memory required by the application program exceeds the preset value, or is greater than the video memory value of all GPUs in the cluster, then returns the video memory allocation failure; S61, wraps the execution thread, periodically Check the core usage rate of the GPU by the program. If it exceeds the set core usage value, or is greater than the memory value of all GPUs in the cluster, the current execution thread is assigned to the waiting execution thread. After the shared scheduling of GPUs is completed, the GPU manager allocates GPU memory and GPU cores to applications according to the GPU application. However, if there is no corresponding resource isolation mechanism, it cannot guarantee that the application uses the GPU resources beyond the contract and other applications cannot be used normally. .

实施例3Example 3

如图1、2、3、4、5和6所示；对于需要多个GPU的应用：按照通讯效率最高的GPU组进行分配。GPU在机器中的连接结构不同，GPU之间的通讯速度也会不同。如附图3所示，DGX-1机器中包含8个GPU，其中GPU0与GPU1、GPU2、GPU3、GPU4可以直接通过NVLink的方式连接，其通讯带宽可以达到40GB/s。而GPU0与GPU5、GPU6、GPU7连接则需要通过PCIe Switch以及QPI完成，相比NVLink其通讯效率大打折扣。当向应用分配多个GPU时应考虑所分配的多个GPU间的连接结构，也称为GPU的拓扑结构。通过GPU的驱动可以获取GPU之间的拓扑结构，而通过该拓扑结构可以连接到各个GPU之间的通讯效率。一个GPU拓扑结构的示例见附图6。As shown in Figures 1, 2, 3, 4, 5, and 6; for applications that require multiple GPUs: allocate according to the GPU group with the highest communication efficiency. The connection structure of GPUs in the machine is different, and the communication speed between GPUs will also be different. As shown in Figure 3, the DGX-1 machine contains 8 GPUs, of which GPU0, GPU1, GPU2, GPU3, and GPU4 can be directly connected through NVLink, and the communication bandwidth can reach 40GB/s. The connection between GPU0 and GPU5, GPU6, and GPU7 needs to be completed through PCIe Switch and QPI, which greatly reduces the communication efficiency compared with NVLink. When allocating multiple GPUs to an application, the connection structure between the multiple GPUs allocated should be considered, also known as the topology of the GPUs. The topology structure between the GPUs can be obtained through the driving of the GPU, and the communication efficiency between the GPUs can be connected through the topology structure. An example of a GPU topology is shown in Figure 6.

GPU管理器按照应用被分配的机器在相应的机器中使用穷举法找到一组连接效率最高的GPU分配给该应用完成GPU资源的调度和绑定。例如在一个需要三个V100型号GPU的应用，我们将其分配到了一台DGX-1机器中，其中GPU0、GPU1、GPU3、GPU5、GPU7是空闲的，我们按照(GPU0、GPU1、GPU3)，(GPU1、GPU3、GPU5)，(GPU3、GPU5、GPU7)，(GPU0、GPU1、GPU5)，(GPU0、GPU1、GPU7)，(GPU1、GPU3、GPU7)，(……)的组合进行穷举，选择(GPU0、GPU1、GPU3)作为最终绑定的GPU。The GPU manager uses an exhaustive method in the corresponding machine to find a set of GPUs with the highest connection efficiency and allocates them to the application to complete the scheduling and binding of GPU resources according to the machine to which the application is allocated. For example, in an application that requires three V100 GPUs, we assign them to a DGX-1 machine, where GPU0, GPU1, GPU3, GPU5, GPU7 are idle, we follow (GPU0, GPU1, GPU3), ( GPU1, GPU3, GPU5), (GPU3, GPU5, GPU7), (GPU0, GPU1, GPU5), (GPU0, GPU1, GPU7), (GPU1, GPU3, GPU7), (...) The combination of exhaustive, select (GPU0, GPU1, GPU3) as the final bound GPU.

以上所述仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

1. A GPU resource scheduling method is characterized by comprising the following steps:

s1, collecting basic information of the GPU from the cluster, providing a GPU-usages interface, and entering the step S2;

s2, creating GPU application, sending an application request to a Kubernetes scheduler, and entering the step S3;

s3, the Kubernetes dispatcher traverses all GPU applications in the cluster after receiving the application request, and the step S4 is entered;

s4, calculating a GPU meeting the scheduling requirement of the application through a GPU-usages interface, and entering the step S5;

and S5, the GPU manager binds the appointed GPU resources into the application according to the machine where the GPU is located on the application.

2. The method as claimed in claim 1, wherein in step S2, in the process of creating the GPU application, the application provides the required video memory value and computation force value.

3. The method as claimed in claim 1 or 2, wherein in step S1, the collected basic information of the GPU includes a model of the GPU, a video memory, and a GPU core.

4. The method as claimed in claim 3, wherein in step S4, if there is no GPU in the cluster meeting the scheduling requirement of the application, the method proceeds to steps S6, S6, and GPU resource isolation.

5. The method for scheduling GPU resources of claim 4, wherein S6 includes steps S60 and S61, S60, and if the video memory required by the application exceeds the preset value or is greater than all GPU video memory values in the cluster, then returning a video memory allocation failure; s61, packaging the execution thread, periodically checking the core utilization rate of the program to the GPU, if the core utilization rate exceeds the set core utilization value or is greater than the video memory values of all GPUs in the cluster, then transferring the current execution thread into the waiting execution thread.

6. A method as claimed in claim 1, 2, 4 or 5, wherein in step S2, the model number of the GPU and the number of GPUs required by the GPU should be provided in the process of creating the GPU application.

7. The method as claimed in claim 6, wherein in step S4, the first GPU meeting the requirement is taken, and the name of the machine where the GPU is located and the number of the GPU in the machine are marked on the application.

8. A method as claimed in claim 1, 2, 4, 5 or 7, wherein in step S4, the machines with the corresponding number of idle GPUs are found through the GPU-usages interface, and the machine with the least number of idle GPUs is selected from the machines and its name is added to the application.

9. A method as claimed in claim 1, 2, 4, 5 or 7, wherein in step S5, the GPU manager uses exhaustion to allocate the GPU to the application, so as to complete the scheduling and binding of GPU resources.

10. A method for scheduling GPU resources as claimed in any of claims 1 to 9, wherein the method performs scheduling of GPU resources for a GPU application or a plurality of GPU applications.