CN109936604B

CN109936604B - Resource scheduling method, device and system

Info

Publication number: CN109936604B
Application number: CN201711362963.6A
Authority: CN
Inventors: 张皓天; 苏磊; 靳江明
Original assignee: Beijing Tusimple Technology Co Ltd
Current assignee: Beijing Original Generation Technology Co.,Ltd.
Priority date: 2017-12-18
Filing date: 2017-12-18
Publication date: 2022-07-26
Anticipated expiration: 2037-12-18
Also published as: CN109936604A

Abstract

The invention discloses a resource scheduling method, a resource scheduling device and a resource scheduling system, which aim to solve the technical problem of low utilization rate of GPU resources in the prior art. The method comprises the following steps: monitoring the allocable resources of each GPU in each host machine; when a new task is received, determining demand resources corresponding to the new task; determining a target GPU (graphics processing Unit) with distributable resources meeting the required resources according to the distributable resources of each GPU in the host machine; and allocating resources for the new task from the allocable resources of the target GPU, and allocating the new task to the host machine where the target GPU is located. By adopting the technical scheme of the invention, the utilization rate of GPU resources is improved, and the task execution efficiency and speed are improved.

Description

A resource scheduling method, device and system

技术领域technical field

本发明涉及计算机领域，特别涉及一种资源调度方法、一种资源调度装置和一种资源调度系统。The invention relates to the field of computers, in particular to a resource scheduling method, a resource scheduling device and a resource scheduling system.

背景技术Background technique

目前，基于master-worker模式的分布式计算集群系统运用越来越广泛(例如docker容器集群)，基于master-worker模式的分布式计算集群系统包括master端服务器和多个worker端宿主机。master端服务器用于接收新任务、给新任务分配资源和给worker宿主机分配任务等；worker宿主机用于接收新任务，并执行所述新任务。At present, the distributed computing cluster system based on the master-worker mode is more and more widely used (such as docker container cluster). The distributed computing cluster system based on the master-worker mode includes a master server and multiple worker hosts. The master server is used for receiving new tasks, allocating resources to new tasks, and allocating tasks to worker hosts; the worker hosts are used to receive new tasks and execute the new tasks.

分布式计算集群系统中，master端服务器在给新任务分配资源时，将worker宿主机中的一整块或多块GPU(Graphics Processing Unit，图形处理器)的全部资源分配给同一个任务，即一个任务占用一整块或多块GPU的全部资源。In a distributed computing cluster system, when the master server allocates resources to new tasks, it allocates all the resources of one or more GPUs (Graphics Processing Unit, graphics processors) in the worker host to the same task, that is, A task occupies all the resources of one or more GPUs.

Master端服务器在接收到新任务时，判断worker端宿主机上是否存在未分配给任何一个任务的整块GPU，若不存在则等待执行中任务执行完毕后再给新任务分配一个或多个整块GPU资源。但是，在实际使用中，一个任务往往并不是时时刻刻100％的使用已分配的整块GPU资源，例如可能会在很长一段时间内该任务仅使用了整块GPU中的30％或50％的资源，该GPU中的其他资源处于空闲状态。因此，现有的资源分配方式并不能充分、合理的利用整块GPU的资源，GPU资源利用率较低。When the master server receives a new task, it determines whether there is a whole GPU that is not assigned to any task on the worker host. Block GPU resources. However, in actual use, a task often does not use 100% of the allocated GPU resources all the time. For example, the task may only use 30% or 50% of the entire GPU for a long period of time. % of resources, other resources in this GPU are idle. Therefore, the existing resource allocation method cannot fully and reasonably utilize the resources of the entire GPU, and the utilization rate of GPU resources is low.

发明内容SUMMARY OF THE INVENTION

鉴于上述问题，本发明提供一种资源调度方法、装置和系统，以解决现有技术中GPU资源利用率较低的技术问题。In view of the above problems, the present invention provides a resource scheduling method, device and system to solve the technical problem of low utilization rate of GPU resources in the prior art.

本发明实施例，第一方面提供一种资源调度方法，所述方法应用于master-worker模式的分布式计算集群中的master端服务器，该方法包括：According to an embodiment of the present invention, a first aspect provides a resource scheduling method. The method is applied to a master server in a distributed computing cluster in a master-worker mode, and the method includes:

监控每个宿主机中各GPU的可分配资源；Monitor the allocatable resources of each GPU in each host;

在接收到新任务时，确定新任务对应的需求资源；When a new task is received, determine the required resources corresponding to the new task;

根据宿主机中的各GPU的可分配资源，确定出可分配资源满足所述需求资源的目标GPU；According to the allocatable resources of each GPU in the host machine, determine the target GPU whose allocatable resources meet the required resources;

从所述目标GPU的可分配资源中为所述新任务分配资源，并将所述新任务分配给所述目标GPU所在的宿主机。Allocate resources for the new task from the allocatable resources of the target GPU, and allocate the new task to the host machine where the target GPU is located.

本发明实施例中，第二方面提供一种资源调度方法，该方法适用于master-worker模式的分布式计算集群中的worker端宿主机，方法包括：In the embodiment of the present invention, a second aspect provides a resource scheduling method, the method is applicable to a worker-side host in a distributed computing cluster in a master-worker mode, and the method includes:

确定宿主机中各GPU中的可分配资源；Determine the allocatable resources in each GPU in the host;

将各GPU的可分配资源发送给master端服务器；Send the allocatable resources of each GPU to the master server;

执行所述master端服务器分配的任务。Execute the task assigned by the master server.

本发明实施例中，第三方面提供一种资源调度装置，该装置设置在master-worker模式的分布式计算集群中的master端服务器，装置包括：In the embodiment of the present invention, a third aspect provides a resource scheduling device, the device is set on a master server in a distributed computing cluster in a master-worker mode, and the device includes:

监控单元，用于监控每个宿主机中各GPU的可分配资源；A monitoring unit for monitoring the allocatable resources of each GPU in each host;

解析单元，用于在接收到新任务时，确定新任务对应的需求资源；The parsing unit is used to determine the required resources corresponding to the new task when the new task is received;

确定单元，用于根据宿主机中的各GPU的可分配资源，确定出可分配资源满足所述需求资源的目标GPU；a determining unit, configured to determine a target GPU whose allocatable resources meet the required resources according to the allocatable resources of each GPU in the host;

分配单元，用于从所述目标GPU的可分配资源中为所述新任务分配资源，并将所述新任务分配给所述目标GPU对应的宿主机。an allocation unit, configured to allocate resources for the new task from the allocatable resources of the target GPU, and allocate the new task to a host corresponding to the target GPU.

本发明实施例中，第四方面提供一种资源调度装置，该装置设置在master-worker模式的分布式计算集群中的worker端宿主机中，装置包括：In the embodiment of the present invention, a fourth aspect provides a resource scheduling device, the device is set in a worker-side host in a distributed computing cluster in a master-worker mode, and the device includes:

资源确定单元，用于确定宿主机中各GPU中的可分配资源；a resource determination unit, used to determine the allocatable resources in each GPU in the host;

通信单元，用于将各GPU的可分配资源发送给master端服务器；The communication unit is used to send the allocatable resources of each GPU to the master server;

执行单元，用于执行所述master端服务器分配的任务。The execution unit is used for executing the task assigned by the master server.

本发明实施例中，第五方面提供一种资源调度系统，包括master端服务器和分别与所述master端服务器连接的多个worker端宿主机，其中：In the embodiment of the present invention, a fifth aspect provides a resource scheduling system, including a master server and a plurality of worker hosts respectively connected to the master server, wherein:

master端服务器，用于监控每个宿主机中各GPU的可分配资源；在接收到新任务时，确定新任务对应的需求资源；根据宿主机中的各GPU的可分配资源，确定出可分配资源满足所述需求资源的目标GPU；从所述目标GPU的可分配资源中为所述新任务分配资源，并将所述新任务分配给所述目标GPU对应的宿主机；The master server is used to monitor the allocatable resources of each GPU in each host; when receiving a new task, determine the required resources corresponding to the new task; according to the allocatable resources of each GPU in the host, determine the allocatable resources A target GPU whose resources meet the required resources; allocate resources for the new task from the allocatable resources of the target GPU, and allocate the new task to a host machine corresponding to the target GPU;

宿主机，用于确定宿主机中GPU的可分配资源，并将可分配资源发送给master端服务器，以及执行所述master端服务器分配的任务。The host machine is used to determine the allocatable resources of the GPU in the host machine, send the allocatable resources to the master server, and execute the tasks assigned by the master server.

本发明实施例中，针对master-worker模式的分布式计算集群，master端服务器监控每个宿主机中各GPU的可分配资源；在接收到的新任务时，并不是直接将GPU的整块资源全部分配给新任务，而是按照新任务的需求资源从GPU的可分配资源中分配与需求资源相应大小的资源量。采用本发明技术方案，同一块GPU资源分配给执行中任务之后若还有剩余的可分配资源时，还能将该GPU的可分配资源分配给其他任务使用，从而能够实现多个任务共用同一块GPU的资源，充分利用GPU资源，解决现有技术中一个任务独占整块GPU资源而导致GPU资源利用率低的问题；并且，由于采用本发明技术方案使得与现有技术具有相同的GPU资源量的情况下可以供更多的任务使用，接收到新任务时能够及时为新任务分配资源，整体能够提高任务执行速度和效率。In the embodiment of the present invention, for the distributed computing cluster in the master-worker mode, the master server monitors the allocatable resources of each GPU in each host; when receiving a new task, it does not directly transfer the entire resource of the GPU All are allocated to the new task, but a resource amount corresponding to the required resource is allocated from the GPU's allocatable resources according to the required resources of the new task. By adopting the technical scheme of the present invention, if the same GPU resource is allocated to a task in execution, if there are remaining allocatable resources, the allocatable resources of the GPU can also be allocated to other tasks for use, so that multiple tasks can share the same block GPU resources, make full use of GPU resources, and solve the problem of low utilization rate of GPU resources caused by a task monopolizing a whole block of GPU resources in the prior art; and, due to the adoption of the technical solution of the present invention, the same amount of GPU resources as the prior art is achieved It can be used for more tasks in the case of new tasks, and resources can be allocated to new tasks in time when new tasks are received, which can improve the speed and efficiency of task execution as a whole.

附图说明Description of drawings

附图用来提供对本发明的进一步理解，并且构成说明书的一部分，与本发明的实施例一起用于解释本发明，并不构成对本发明的限制。The accompanying drawings are used to provide a further understanding of the present invention, and constitute a part of the specification, and are used to explain the present invention together with the embodiments of the present invention, and do not constitute a limitation to the present invention.

图1为本发明实施例中资源调度系统的结构示意图；1 is a schematic structural diagram of a resource scheduling system in an embodiment of the present invention;

图2为本发明实施例中设置在master端服务器中的资源调度装置的结构示意图之一；FIG. 2 is one of the schematic structural diagrams of the resource scheduling device arranged in the master server in an embodiment of the present invention;

图3为本发明实施例中在资源池中记录的各GPU的可分配资源量的示意图；3 is a schematic diagram of the amount of allocatable resources of each GPU recorded in a resource pool in an embodiment of the present invention;

图4为本发明实施例中设置在master端服务器中的资源调度装置的结构示意图之二；FIG. 4 is the second schematic structural diagram of the resource scheduling apparatus arranged in the master server according to the embodiment of the present invention;

图5为本发明实施例中在任务信息维护单元中维护的宿主机对应的任务信息的示意图；5 is a schematic diagram of task information corresponding to a host machine maintained in a task information maintenance unit in an embodiment of the present invention;

图6为对图5中的任务信息进行更新后的示意图；Fig. 6 is the schematic diagram after the task information in Fig. 5 is updated;

图7为本发明实施例中确定单元的结构示意图之一；7 is one of the schematic structural diagrams of the determination unit in the embodiment of the present invention;

图8为本发明实施例中确定单元的结构示意图之二；FIG. 8 is the second schematic structural diagram of the determination unit in the embodiment of the present invention;

图9为本发明实施例中确定单元的结构示意图之三；9 is a third schematic structural diagram of a determination unit in an embodiment of the present invention;

图10为本发明实施例中设置在worker端宿主机中的资源调度装置的结构示意图；10 is a schematic structural diagram of a resource scheduling apparatus disposed in a worker-side host in an embodiment of the present invention;

图11为本发明实施例中设置在master端服务器中的资源调度方法的流程图；11 is a flowchart of a resource scheduling method set in a master server in an embodiment of the present invention;

图12为实现图11中的步骤103的流程图之一；Fig. 12 is one of the flowcharts for realizing step 103 in Fig. 11;

图13为实现图11中的步骤103的流程图之二；Fig. 13 is the second flow chart for realizing step 103 in Fig. 11;

图14为实现图11中的步骤103的流程图之三；Fig. 14 is the third flow chart for realizing step 103 in Fig. 11;

图15为实现图11中的步骤103的流程图之四；Fig. 15 is the fourth flow chart for realizing step 103 in Fig. 11;

图16为实现图11中的步骤103的流程图之五；Fig. 16 is the fifth flow chart for realizing step 103 in Fig. 11;

图17为本发明实施例中设置在worker端宿主机中的资源调度方法的流程图。FIG. 17 is a flowchart of a resource scheduling method set in a worker-side host in an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明中的技术方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described The embodiments are only some of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明技术方案适用于所有mster-worker模式的分布式计算集群，例如docker容器集群、引擎计算集群等。本申请对于具体的分布式计算集群不做严格限定。The technical solution of the present invention is applicable to all distributed computing clusters in the master-worker mode, such as docker container clusters, engine computing clusters, and the like. This application does not strictly limit the specific distributed computing cluster.

实施例一Example 1

如图1所示为资源调度系统的结构示意图，在资源调度系统为mster-worker模式的分布式计算集群，在分布式计算集群包括master服务器和多个分别与master服务器通信连接的worker端宿主机。Figure 1 is a schematic diagram of the structure of the resource scheduling system. The resource scheduling system is a distributed computing cluster in the master-worker mode. The distributed computing cluster includes a master server and a plurality of worker-side hosts that communicate with the master server. .

Master服务器可通过设置在该master服务器上的master程序实现以下功能：实时或周期性地监控每个宿主机中各GPU的可分配资源；接收新任务，并解析新任务对应的任务参数得到新任务对应的需求资源；根据各宿主机中的各GPU的可分配资源，确定出可分配资源满足所述新任务的需求资源的目标GPU；从目标GPU的可分配资源中为所述新任务分配资源，并将所述新任务分配给所述目标GPU所在的宿主机，以便由该目标GPU所在的宿主机调用相应的worker程序执行所述新任务。The master server can realize the following functions through the master program set on the master server: monitor the allocatable resources of each GPU in each host in real time or periodically; receive new tasks, and parse the task parameters corresponding to the new tasks to obtain new tasks Corresponding required resources; according to the allocatable resources of each GPU in each host, determine a target GPU whose allocatable resources meet the required resources of the new task; allocate resources for the new task from the allocatable resources of the target GPU , and assign the new task to the host machine where the target GPU is located, so that the host machine where the target GPU is located calls the corresponding worker program to execute the new task.

每个worker端宿主机可通过设置在宿主机上的worker程序实现以下功能：实时或周期性地确定该worker程序所在的宿主机上的各GPU的可分配资源，并将各GPU的可分配资源发送给master端服务器，以及执行master服务器分配给该宿主机的任务。Each worker-side host can implement the following functions through the worker program set on the host: determine the allocatable resources of each GPU on the host where the worker program is located in real time or periodically, and assign the allocatable resources of each GPU to the host. Send it to the master server, and execute the tasks assigned to the host by the master server.

本发明实施例中，宿主机将该宿主机上的各GPU对应的可分配资源发送给master端服务器的机制有多种，本申请不做严格限定。例如，worker程序周期性地主动将其所在的宿主机上的各GPU的可分配资源同步给master端服务器；还例如，master端服务器周期性地向各个宿主机发送资源获取请求，各宿主机中的worker程序根据接收到的资源获取请求将其所在宿主机上的各GPU的可分配资源发送给master端服务器；还例如，master端服务器可以周期性地轮询各个宿主机，worker程序在master端服务器轮询到其所在的宿主机时，将该宿主机上的各GPU的可分配资源发送给master端服务器。In the embodiment of the present invention, there are various mechanisms for the host to send the allocatable resources corresponding to each GPU on the host to the master server, which are not strictly limited in this application. For example, the worker program periodically actively synchronizes the allocatable resources of each GPU on its host to the master server; for example, the master server periodically sends resource acquisition requests to each host. The worker program sends the allocatable resources of each GPU on its host to the master server according to the received resource acquisition request; for example, the master server can periodically poll each host, and the worker program is on the master side. When the server polls the host where it is located, it sends the allocatable resources of each GPU on the host to the master server.

为便于本领域技术人员进一步理解本发明技术方案，下面分别从master端服务器和worker端宿主机对本发明技术方案进行详细的描述。In order to facilitate those skilled in the art to further understand the technical solution of the present invention, the technical solution of the present invention will be described in detail below from the master server and the worker host respectively.

实施例二Embodiment 2

master端服务器中的master程序可通过该master的子程序scheduler(即资源调度装置)实现前述功能，该资源调度装置的结构如图2所示，可包括监控单元11、解析单元12、确定单元13和分配单元14，其中：The master program in the master server can implement the aforementioned functions through the subprogram scheduler (ie, the resource scheduling device) of the master. The structure of the resource scheduling device is shown in FIG. and allocation unit 14, where:

监控单元11，用于监控每个宿主机中各GPU的可分配资源。The monitoring unit 11 is configured to monitor the allocatable resources of each GPU in each host.

解析单元12，用于在接收到新任务时，确定新任务对应的需求资源。The parsing unit 12 is configured to determine the required resources corresponding to the new task when the new task is received.

本发明实施例中，解析单元12接收到新任务时，通过预置的解析规则解析新任务对应的任务参数，以得到新任务对应的需求资源，例如任务参数中包含新任务的身份信息(如名称或ID等)、新任务需要的GPU资源信息(GPU资源信息包括GPU张数、占用每张GPU的资源量)。In the embodiment of the present invention, when the parsing unit 12 receives a new task, it parses the task parameters corresponding to the new task by using preset parsing rules to obtain the required resources corresponding to the new task, for example, the task parameters include the identity information of the new task (eg Name or ID, etc.), GPU resource information required by the new task (GPU resource information includes the number of GPU sheets and the amount of resources occupied by each GPU).

确定单元13，用于根据宿主机中的各GPU的可分配资源，确定出可分配资源满足所述需求资源的目标GPU。The determining unit 13 is configured to determine, according to the allocatable resources of each GPU in the host, a target GPU whose allocatable resources meet the required resources.

分配单元14，用于从所述目标GPU的可分配资源中为所述新任务分配资源，并将所述新任务分配给所述目标GPU对应的宿主机。The allocation unit 14 is configured to allocate resources for the new task from the allocatable resources of the target GPU, and allocate the new task to a host corresponding to the target GPU.

本发明实施例中，监控单元11监控每个宿主机中各GPU的可分配资源可通过但不仅限于以下方式实现：In this embodiment of the present invention, the monitoring unit 11 can monitor the allocatable resources of each GPU in each host by, but not limited to, the following methods:

监控单元11建立资源池(即resource pool)，在该资源池中动态记录每个宿主机中各GPU的可分配资源量，如图3所示，宿主机(用H1表示)包含3张GPU(分别用H1G1、H1G2、H1G3表示)，H1G1、H1G2、H1G3分别对应的可分配资源量为N11、N12、N13。监控单元11在从宿主机中接收该宿主机上各GPU对应的可分配资源时，根据接收到的各GPU对应的可分配资源更新资源池中相应GPU对应的可分配资源量。The monitoring unit 11 establishes a resource pool (ie, a resource pool), and dynamically records the resource allocations of each GPU in each host in the resource pool. As shown in FIG. 3 , the host (represented by H1 ) contains 3 GPUs ( They are represented by H1G1, H1G2, and H1G3 respectively), and the allocated resources corresponding to H1G1, H1G2, and H1G3 are N11, N12, and N13, respectively. When receiving from the host the allocatable resources corresponding to each GPU on the host, the monitoring unit 11 updates the amount of allocatable resources corresponding to the corresponding GPUs in the resource pool according to the received allocatable resources corresponding to each GPU.

当然本领域技术人员还可以通过其他方式监控每个宿主机中各GPU的可分配资源，例如通过建立动态列表，在该动态列表中记录每个宿主机中各个GPU的可分配资源量的信息，实时或周期性的维护该动态列表中的信息。Of course, those skilled in the art can also monitor the allocatable resources of each GPU in each host in other ways, for example, by establishing a dynamic list, in which the information on the amount of allocatable resources of each GPU in each host is recorded, The information in the dynamic list is maintained in real time or periodically.

本发明实施例中，确定单元13从监控单元11中获取各宿主机中的各GPU的可分配资源，以确定出可分配资源满足所述新任务对应的需求资源的目标GPU。In the embodiment of the present invention, the determining unit 13 acquires the allocatable resources of each GPU in each host from the monitoring unit 11 to determine a target GPU whose allocatable resources meet the required resources corresponding to the new task.

优选地，为及时更新资源池中的各GPU对应的可分配资源量，分配单元14在从目标GPU的可分配资源中为新任务分配资源后，将目标GPU及其分配给新任务的资源量同步给监控单元11，由监控单元11及时更新该目标GPU的可分配资源量。以图3所示H1G1为目标GPU为例，在从该H1G1的可分配资源中为新任务分配资源之前H1G1的可分配资源量为N11，当分配单元14从目标GPU的可分配资源中为新任务分配量为M1的资源之后，则该目标GPU的可分配资源变为N11-M1。Preferably, in order to update the amount of allocatable resources corresponding to each GPU in the resource pool in time, the allocation unit 14 allocates resources to the new task from the allocatable resources of the target GPU, and then allocates the target GPU and the amount of resources allocated to the new task Synchronized to the monitoring unit 11, and the monitoring unit 11 updates the amount of allocatable resources of the target GPU in time. Taking the H1G1 shown in FIG. 3 as the target GPU as an example, the amount of allocable resources of the H1G1 before allocating resources for the new task from the allocable resources of the H1G1 is N11, and when the allocation unit 14 is the new task from the allocable resources of the target GPU, it is N11. After the task allocation amount is M1 resources, the allocatable resources of the target GPU become N11-M1.

优选地，为进一步及时获知各宿主机中的任务信息，所述资源调度装置还可进一步包括任务信息维护单元15，如图4所示，其中：Preferably, in order to further know the task information in each host in a timely manner, the resource scheduling apparatus may further include a task information maintenance unit 15, as shown in FIG. 4 , wherein:

任务信息维护单元15，用于记录各宿主机对应的任务信息，其中任务信息包括宿主机上所有的执行中任务、分配给每个执行中任务的GPU资源信息，GPU资源信息包括：执行中任务对应的GPU以及占用每张GPU的资源量。The task information maintenance unit 15 is used to record the task information corresponding to each host machine, wherein the task information includes all the tasks in execution on the host machine, and the GPU resource information allocated to each task in execution, and the GPU resource information includes: tasks in execution The corresponding GPU and the amount of resources occupied by each GPU.

如图5所示，宿主机H1中包含3张GPU(分别用H1G1、H1G2、H1G3表示)，宿主机H1中包含两个任务(分别用任务A1和任务A2表示)，其中：任务A1对应H1G1、H1G2，H1G1分配给任务A1的资源量为M11，H1G2分配给任务A1的资源量为M12；任务A2对应H1G3，H1G3分配给任务A2的资源量为M21。As shown in Figure 5, the host machine H1 contains 3 GPUs (represented by H1G1, H1G2, and H1G3 respectively), and the host machine H1 contains two tasks (represented by task A1 and task A2 respectively), among which: task A1 corresponds to H1G1 , H1G2, the amount of resources allocated by H1G1 to task A1 is M11, and the amount of resources allocated by H1G2 to task A1 is M12; task A2 corresponds to H1G3, and the amount of resources allocated by H1G3 to task A2 is M21.

优选地，为及时更新各宿主机对应的任务信息，本发明实施例中，分配单元14在从目标GPU的可分配资源中为新任务分配资源后，将目标GPU及其分配给新任务的资源量同步给任务信息维护单元15，以便任务信息维护单元15及时更新目标GPU所在宿主机对应的任务信息。以图5所示的H1G2的为目标GPU为例，新任务用任务3表示，新任务则宿主机H1对应的任务信息如图6所示，新增任务3，任务3对应H1G2，H1G2为任务3分配的资源量为M31。Preferably, in order to update the task information corresponding to each host in time, in this embodiment of the present invention, after allocating resources for the new task from the allocatable resources of the target GPU, the allocation unit 14 allocates the target GPU and the resources allocated to the new task The amount is synchronized to the task information maintenance unit 15, so that the task information maintenance unit 15 updates the task information corresponding to the host machine where the target GPU is located in time. Taking the H1G2 as the target GPU shown in Figure 5 as an example, the new task is represented by task 3, and the new task is the task information corresponding to the host H1 as shown in Figure 6. New task 3, task 3 corresponds to H1G2, and H1G2 is the task 3 The amount of resources allocated is M31.

优选地，本发明实施例中，各宿主机在执行完某一任务之后，释放该任务对应的资源，并将该任务执行完毕的状态信息和该任务占用的资源信息同步给监控单元11和任务信息维护单元15，以便监控单元11、任务信息维护单元15更新信息。Preferably, in the embodiment of the present invention, after each host machine finishes executing a certain task, it releases the resources corresponding to the task, and synchronizes the status information of the task execution completion and the resource information occupied by the task to the monitoring unit 11 and the task Information maintenance unit 15, so that the monitoring unit 11 and the task information maintenance unit 15 update information.

本发明实施例中，GPU的可分配资源可以是GPU中的空闲资源，也可以是GPU中的可共享资源，还可以是GPU中的空闲资源和可共享资源。GPU的空闲资源是指GPU中未分配给执行中任务的资源，GPU的可共享资源是指GPU中已分配给执行中任务的资源中的预测在一段时间内未被执行中任务利用的部分资源。例如，以图5所示的H1G1为例，假设H1G1的资源总量为N1，H1G1目前包含任务A1和任务A2，其中H1G1分配给任务A1的资源量为M11，分配给任务A2的资源量为M12，其中任务A1在一段时间内仅占用M11’的资源量，任务A2在一段时间内仅占用M12’的资源量，则H1G1中的空闲资源为N1-M11-M12，H1G1中的可共享资源包括(M11-M11’)和(M12-M12’)。下面分别以示例1、示例2和示例3进行描述。In this embodiment of the present invention, the allocatable resources of the GPU may be idle resources in the GPU, may also be shareable resources in the GPU, and may also be idle resources and shareable resources in the GPU. The idle resources of the GPU refer to the resources in the GPU that are not allocated to the tasks in execution, and the sharable resources of the GPU refer to the resources that have been allocated to the tasks in execution in the GPU and are predicted to be unused by the tasks in execution for a period of time. . For example, taking H1G1 shown in Figure 5 as an example, assuming that the total amount of resources of H1G1 is N1, H1G1 currently contains task A1 and task A2, and the amount of resources allocated to task A1 by H1G1 is M11, and the amount of resources allocated to task A2 is M12, in which task A1 only occupies the resources of M11' for a period of time, and task A2 only occupies the resources of M12' for a period of time, then the idle resources in H1G1 are N1-M11-M12, the shareable resources in H1G1 Including (M11-M11') and (M12-M12'). Example 1, Example 2 and Example 3 are respectively described below.

示例1Example 1

在示例1中，GPU的可分配资源为GPU中的空闲资源，确定单元13的结构如图7所示，包括判断子单元131和确定子单元132，其中：In Example 1, the allocatable resources of the GPU are idle resources in the GPU, and the structure of the determination unit 13 is shown in FIG. 7 , including a determination subunit 131 and a determination subunit 132, wherein:

判断子单元131，用于判断宿主机中的各GPU中是否存在可分配资源大于等于所述需求资源的候选GPU，若存在候选GPU则触发确定子单元132；The judgment subunit 131 is used for judging whether there is a candidate GPU with resources that can be allocated greater than or equal to the required resource in each GPU in the host machine, and if there is a candidate GPU, the determination subunit 132 is triggered;

确定子单元132，用于从所述候选GPU中选取其中一个GPU作为目标GPU。A determination subunit 132, configured to select one of the GPUs from the candidate GPUs as a target GPU.

确定子单元132可以从候选GPU中随机选取一个GPU作为目标GPU，也可以从候选GPU中选取可分配资源最少的GPU作为目标GPU，本申请不作严格限定。The determination subunit 132 may randomly select one GPU from the candidate GPUs as the target GPU, or may select the GPU with the least allocatable resources from the candidate GPUs as the target GPU, which is not strictly limited in this application.

优选地，判断子单元131确定不存在候选GPU时，若新任务的优先级较高，为确保高优先级任务能够及时执行，前述判断子单元131进一步用于：若不存在候选GPU时，判断宿主机中是否存在优先级低于所述新任务、且分配的资源大于等于所述需求资源的可抢占任务；若存在可抢占任务，则从所述可抢占任务中选取一个目标任务，将所述目标任务的资源分配给所述新任务，并将所述新任务分配给所述目标任务所在的宿主机。若不存在可抢占任务，则将新任务放入预置的阻塞池中等待分配资源。Preferably, when the judging subunit 131 determines that there is no candidate GPU, if the priority of the new task is higher, in order to ensure that the high-priority task can be executed in time, the foregoing judging subunit 131 is further configured to: if there is no candidate GPU, determine Whether there is a preemptible task in the host with a priority lower than the new task and the allocated resource is greater than or equal to the required resource; if there is a preemptible task, select a target task from the preemptible tasks, The resources of the target task are allocated to the new task, and the new task is allocated to the host computer where the target task is located. If there is no preemptible task, the new task is put into the preset blocking pool to wait for resource allocation.

示例2Example 2

在示例2中，GPU的可分配资源为GPU中的可共享资源。确定单元13的结构可参见图7所示，包括判断子单元131和确定子单元132，其中判断子单元131和确定子单元132的具体功能可参见示例1，在此不再赘述。In Example 2, the allocatable resources of the GPU are sharable resources in the GPU. The structure of the determination unit 13 can be referred to as shown in FIG. 7 , and includes a determination subunit 131 and a determination subunit 132 , wherein the specific functions of the determination subunit 131 and the determination subunit 132 can be found in Example 1, which will not be repeated here.

优选地，由于GPU的可共享资源为该GPU中已经分配给一个执行中任务的资源中的一部分，该执行中任务可能在一段时间之后需要的资源量有所增加，为确保该执行中任务能够顺利执行完毕，本发明实施例中，设定GPU的可共享资源只能分配给优先级比该GPU中任意一个执行中任务都低的新任务，因此，在示例2中，确定子单元132从候选GPU选取一个GPU作为目标GPU，具体为：从所述候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Preferably, since the sharable resources of the GPU are part of the resources in the GPU that have been allocated to an executing task, the amount of resources required by the executing task may increase after a period of time. In order to ensure that the executing task can After the successful execution is completed, in the embodiment of the present invention, the shareable resources of the GPU are set to be allocated only to new tasks with a lower priority than any task in execution in the GPU. Therefore, in Example 2, the subunit 132 determines that the The candidate GPU selects a GPU as the target GPU, specifically: selecting a GPU including tasks in execution with priorities higher than the new task from the candidate GPUs as the target GPU.

优选地，判断子单元131确定不存在候选GPU时，若新任务的优先级较高，为确保高优先级任务能够及时执行，前述判断子单元131进一步用于：若判断不存在候选GPU时，判断宿主机中是否存在优先级低于所述新任务、且分配的资源大于等于所述需求资源的可抢占任务；若存在可抢占任务，则从所述可抢占任务中选取一个目标任务，将所述目标任务的资源分配给所述新任务，并将所述新任务分配给所述目标任务所在的宿主机。若不存在可抢占任务，则将新任务放入预置的阻塞池中等待分配资源。Preferably, when the judging subunit 131 determines that there is no candidate GPU, if the priority of the new task is higher, in order to ensure that the high-priority task can be executed in time, the aforementioned judging subunit 131 is further used for: if it is judged that there is no candidate GPU, Determine whether there is a preemptible task in the host machine with a priority lower than the new task and the allocated resource is greater than or equal to the required resource; if there is a preemptible task, select a target task from the preemptible task, The resources of the target task are allocated to the new task, and the new task is allocated to the host computer where the target task is located. If there is no preemptible task, the new task is put into the preset blocking pool to wait for resource allocation.

本发明实施例中，master程序可以周期性地从阻塞池中选取优先级最高或者放置在阻塞池中时间最长的任务，将选取的任务作为新任务发送给分析单元12。In the embodiment of the present invention, the master program may periodically select the task with the highest priority or the longest time in the blocking pool from the blocking pool, and send the selected task to the analysis unit 12 as a new task.

示例3Example 3

在示例3中，GPU的可分配资源为GPU中的空闲资源和可共享资源，所述确定单元13的结构如图8所示，包括第一判断子单元133、第一确定子单元134、第二判断子单元135和第二确定子单元136，其中：In Example 3, the allocatable resources of the GPU are idle resources and sharable resources in the GPU. The structure of the determining unit 13 is shown in FIG. 8 , including a first determining subunit 133, a first determining subunit 134, a Two judgment subunit 135 and second determination subunit 136, wherein:

第一判断子单元133，用于判断宿主机的各GPU中是否存在空闲资源大于等于所述需求资源的第一候选GPU，若存在第一候选GPU则触发第一确定子单元134，若不存在第一候选GPU则触发第二判断子单元135；The first judging subunit 133 is used to judge whether there is a first candidate GPU with idle resources greater than or equal to the required resource in each GPU of the host machine, if there is a first candidate GPU, trigger the first determination subunit 134, if not The first candidate GPU triggers the second judgment subunit 135;

第一确定子单元134，用于从所述第一候选GPU中选取一个GPU作为目标GPU；The first determination subunit 134 is used to select a GPU as the target GPU from the first candidate GPU;

第二判断子单元135，用于判断宿主机的各GPU中是否存在可共享资源大于等于所述需求资源的第二候选GPU；若存在第二候选GPU时触发第二确定子单元136；The second judging subunit 135 is used for judging whether there is a second candidate GPU whose sharable resource is greater than or equal to the required resource in each GPU of the host machine; if there is a second candidate GPU, the second determining subunit 136 is triggered;

第二确定子单元136，用于从所述第二候选GPU中选取一个GPU作为目标GPU。The second determination subunit 136 is configured to select one GPU from the second candidate GPU as the target GPU.

优选地，第二确定子单元136具体用于：从所述第二候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Preferably, the second determination subunit 136 is specifically configured to: select, from the second candidate GPU, a GPU that includes tasks in execution with higher priorities than the new task as the target GPU.

优选地，在第一判断子单元133确定不存在第一候选GPU时，若新任务的优先级较高，为确保高优先级任务能够及时执行，前述第一判断子单元133进一步用于：判断宿主机中是否存在优先级低于所述新任务、且分配的资源大于等于所述需求资源的可抢占任务；若存在可抢占任务，则从所述可抢占任务中选取一个目标任务，将所述目标任务的资源分配给所述新任务，并将所述新任务分配给所述目标任务所在的宿主机；若不存在可抢占任务，则触发第二判断子单元135。Preferably, when the first judging sub-unit 133 determines that there is no first candidate GPU, if the priority of the new task is higher, in order to ensure that the high-priority task can be executed in time, the aforementioned first judging sub-unit 133 is further used for: judging Whether there is a preemptible task in the host with a priority lower than the new task and the allocated resource is greater than or equal to the required resource; if there is a preemptible task, select a target task from the preemptible tasks, The resources of the target task are allocated to the new task, and the new task is allocated to the host machine where the target task is located; if there is no preemptible task, the second judgment subunit 135 is triggered.

优选地，为确保新任务能够及时被执行，前述图7所示的确定单元13还可进一步包括第三判断子单元137和第三确定子单元138，如图9所示，其中：Preferably, in order to ensure that the new task can be executed in time, the aforementioned determination unit 13 shown in FIG. 7 may further include a third determination subunit 137 and a third determination subunit 138, as shown in FIG. 9, wherein:

所述第二判断子单元135进一步用于：若不存在第二候选GPU时触发第三判断子单元137；The second judgment subunit 135 is further used for: triggering the third judgment subunit 137 if there is no second candidate GPU;

第三判断子单元137，用于判断宿主机的各GPU中是否存在空闲资源与可共享资源的总和大于等于所述需求资源的第三候选GPU，若存在第三候选GPU则触发第三确定子单元138，若不存在第三候选GPU则将新任务放入预置的阻塞池中等待分配资源；The third judging subunit 137 is used to judge whether there is a third candidate GPU whose sum of idle resources and sharable resources is greater than or equal to the required resource in each GPU of the host machine, and triggers the third determinator if there is a third candidate GPU Unit 138, if there is no third candidate GPU, the new task is put into the preset blocking pool to wait for resource allocation;

第三确定子单元138，用于从所述第三候选GPU中选取一个GPU作为目标GPU。The third determination subunit 138 is configured to select one GPU from the third candidate GPU as the target GPU.

优选地，本发明实施例中，第三确定子单元138具体用于：从所述第二候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Preferably, in this embodiment of the present invention, the third determination subunit 138 is specifically configured to: select a GPU that includes tasks in execution with priorities higher than the new task from the second candidate GPU as the target GPU.

实施例三Embodiment 3

本发明实施例三中，worker端宿主机中的worker程序可通过如图10所示的资源调度装置实现，该资源调度装置包括资源确定单元21、通信单元22和执行单元23，其中：In Embodiment 3 of the present invention, the worker program in the worker-side host can be implemented by a resource scheduling device as shown in FIG. 10 . The resource scheduling device includes a resource determination unit 21, a communication unit 22, and an execution unit 23, wherein:

资源确定单元21，用于确定宿主机中各GPU中的可分配资源。The resource determination unit 21 is configured to determine the allocatable resources in each GPU in the host.

通信单元22，用于将各GPU的可分配资源发送给master端服务器。The communication unit 22 is configured to send the allocatable resources of each GPU to the master server.

执行单元23，用于执行master端服务器分配给所述宿主机的任务。The execution unit 23 is configured to execute the task assigned to the host by the master server.

优选地，本发明实施例三中，GPU的可分配资源可以是GPU的空闲资源，也可以是GPU的可共享资源，还可以是GPU的空闲资源和可共享资源。Preferably, in the third embodiment of the present invention, the allocatable resources of the GPU may be idle resources of the GPU, may also be shareable resources of the GPU, and may also be idle resources and shareable resources of the GPU.

在一个实例中，GPU的可分配资源为GPU的空闲资源，则资源确定单元21具体用于：监控宿主机中各GPU中未分配给执行中任务的空闲资源，并将空闲资源作为可分配资源。In an example, the allocatable resources of the GPU are idle resources of the GPU, and the resource determination unit 21 is specifically configured to: monitor the idle resources of each GPU in the host that are not allocated to the tasks in execution, and use the idle resources as the allocatable resources .

在另一个实例中，GPU的可分配资源为GPU的可共享资源，则资源确定单元21具体用于：预测宿主机中各GPU中已分配给执行中任务的资源中在一段时间内未被执行中任务利用的可共享资源，并将可共享资源作为可分配资源。In another example, the allocatable resources of the GPU are the shareable resources of the GPU, and the resource determination unit 21 is specifically configured to: predict that the resources allocated to the executing task in each GPU in the host machine will not be executed within a period of time The shareable resources utilized by the tasks in the medium, and treat the shareable resources as allocatable resources.

在又一个实例中，GPU的可分配资源为GPU的空闲资源和可共享资源，则资源确定单元21具体用于：监控宿主机中各GPU中未分配给执行中任务的空闲资源，以及预测宿主机中各GPU中已分配给执行中任务的资源中在一段时间内未被执行中任务利用的可共享资源，并将所述空闲资源和可共享资源作为可分配资源。In yet another example, the allocatable resources of the GPU are the idle resources and shareable resources of the GPU, and the resource determination unit 21 is specifically configured to: monitor the idle resources of each GPU in the host that are not allocated to the tasks in execution, and predict the host The resources that have been allocated to the executing task in each GPU in the host are sharable resources that have not been used by the executing task for a period of time, and the idle resources and the sharable resources are regarded as allocable resources.

本发明实施例中，资源确定单元21预测宿主机中各GPU中已分配给执行中任务的资源中在一段时间内未被执行中任务利用的可共享资源，具体实现可如下：通过监测各GPU中各执行中任务在历史时间段内的资源利用率，预测未来一段时间内该GPU内各执行中任务的资源利用率，将预测未来一段时间内未使用的那部分资源作为可共享资源。例如某一GPU中包含一个执行中任务A，分配给任务A的GPU资源量为M，监控得到该任务A在一段时间T内的资源利用率一直低于50％，则可以预测在下一时间段内该任务A的资源利用率仍然不超付50％，此时，将任务A的GPU资源量M中的50％确认为未来一时间段的可共享资源。In the embodiment of the present invention, the resource determination unit 21 predicts the resources that have been allocated to the executing tasks in the host machine and the sharable resources that are not used by the executing tasks for a period of time. The specific implementation can be as follows: by monitoring each GPU The resource utilization rate of each executing task in the historical time period is predicted, and the resource utilization rate of each executing task in the GPU is predicted for a period of time in the future, and the part of the resource that is not used for a period of time in the future is predicted as a sharable resource. For example, a GPU contains an executing task A, the amount of GPU resources allocated to task A is M, and the monitoring shows that the resource utilization rate of task A in a period of time T has been lower than 50%, it can be predicted that in the next period of time In this case, the resource utilization rate of the task A still does not exceed 50%. At this time, 50% of the GPU resource M of the task A is confirmed as a sharable resource for a period of time in the future.

优选地，所述执行单元23具体用于：在接收到使用目标GPU的空闲资源执行新任务的第一指令时，利用所述目标GPU的空闲资源执行所述新任务；以及，在接收到使用目标GPU的可共享资源执行新任务的第二指令时，利用所述目标GPU的可共享资源执行所述新任务。Preferably, the execution unit 23 is specifically configured to: when receiving a first instruction to execute a new task using the idle resources of the target GPU, use the idle resources of the target GPU to execute the new task; When the sharable resource of the target GPU executes the second instruction of the new task, the sharable resource of the target GPU is used to execute the new task.

优选地，执行单元23进一步用于：当检测到GPU中的高优先级任务需要使用更多资源时，停止运行所述GPU中的低优先级任务，并将分配给低优先级任务的可共享资源分配给所述高优先级任务。Preferably, the execution unit 23 is further configured to: when it is detected that the high-priority tasks in the GPU need to use more resources, stop running the low-priority tasks in the GPU, and allocate the shareable tasks allocated to the low-priority tasks Resources are allocated to the high-priority tasks.

实施例四Embodiment 4

基于前述实施例二所示的资源调度装置，本发明实施例四提供一种资源调度方法，所述方法应用于master-worker模式的分布式计算集群中的master端服务器，方法的流程图如图11所示，包括：Based on the resource scheduling device shown in the second embodiment, the fourth embodiment of the present invention provides a resource scheduling method. The method is applied to a master server in a distributed computing cluster in a master-worker mode. The flow chart of the method is shown in the figure. 11, including:

步骤101、监控每个宿主机中各GPU的可分配资源；Step 101, monitor the allocatable resources of each GPU in each host;

步骤102、在接收到新任务时，确定新任务对应的需求资源；Step 102, when receiving a new task, determine the required resources corresponding to the new task;

步骤103、根据宿主机中的各GPU的可分配资源，确定出可分配资源满足所述需求资源的目标GPU；Step 103, according to the allocatable resources of each GPU in the host machine, determine the target GPU whose allocatable resources meet the required resources;

步骤104、从所述目标GPU的可分配资源中为所述新任务分配资源，并将所述新任务分配给所述目标GPU所在的宿主机。Step 104: Allocate resources for the new task from the allocatable resources of the target GPU, and allocate the new task to the host machine where the target GPU is located.

在一个具体实例中，所述可分配资源为GPU中的空闲资源，或者可分配资源为GPU中的可共享资源，所述步骤103具体实现可如图12所示，包括：In a specific example, the allocatable resources are idle resources in the GPU, or the allocatable resources are sharable resources in the GPU. The specific implementation of step 103 can be shown in FIG. 12 , including:

步骤A1、判断宿主机中的各GPU中是否存在可分配资源大于等于所述需求资源的候选GPU；若存在候选GPU则执行步骤A2；Step A1, judging whether there is a candidate GPU that can allocate resources greater than or equal to the required resource in each GPU in the host machine; if there is a candidate GPU, then perform step A2;

步骤A2、从所述候选GPU中选取其中一个GPU作为目标GPU。Step A2: Select one of the GPUs from the candidate GPUs as the target GPU.

优选地，若所述可分配资源为GPU中的可共享资源时，所述步骤A2具体包括：从所述候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Preferably, if the allocatable resource is a sharable resource in the GPU, the step A2 specifically includes: selecting from the candidate GPUs a GPU whose priorities of tasks in execution are higher than those of the new task as a target GPU.

优选地，图11所示的流程图中的步骤A1中进一步包括以下步骤：若不存在候选GPU时执行步骤A3～步骤A5，如图13所示：Preferably, step A1 in the flowchart shown in FIG. 11 further includes the following steps: if there is no candidate GPU, step A3 to step A5 are performed, as shown in FIG. 13 :

步骤A3、判断宿主机中是否存在优先级低于所述新任务、且分配的资源大于等于所述需求资源的可抢占任务；若存在可抢占任务则执行步骤A4，若不存在可抢占任务则执行步骤A5；Step A3, determine whether there is a preemptible task in the host machine with a priority lower than the new task, and the allocated resource is greater than or equal to the required resource; if there is a preemptible task, perform step A4, if there is no preemptible task, then Execute step A5;

步骤A4、从所述可抢占任务中选取一个目标任务，将所述目标任务的资源分配给所述新任务，并将所述新任务分配给所述目标任务所在的宿主机。Step A4: Select a target task from the preemptible tasks, allocate the resources of the target task to the new task, and allocate the new task to the host machine where the target task is located.

步骤A5、将新任务放入预置的阻塞池中等待分配资源。Step A5: Put the new task into the preset blocking pool and wait for resource allocation.

在另一个实例中，可分配资源包括GPU中的空闲资源和可共享资源，所述步骤103具体实现可如图14所示，包括：In another example, the allocatable resources include idle resources and sharable resources in the GPU. The specific implementation of step 103 can be shown in FIG. 14 , including:

步骤B1、判断宿主机的各GPU中是否存在空闲资源大于等于所述需求资源的第一候选GPU；若存在第一候选GPU则执行步骤B2，若不存在第一候选GPU则执行步骤B3；Step B1, determine whether there is a first candidate GPU with idle resources greater than or equal to the required resource in each GPU of the host machine; if there is a first candidate GPU, then perform step B2, if there is no first candidate GPU, then perform step B3;

步骤B2、从所述第一候选GPU中选取一个GPU作为目标GPU；Step B2, select a GPU as the target GPU from the first candidate GPU;

步骤B3、判断宿主机的各GPU中是否存在可共享资源大于等于所述需求资源的第二候选GPU；若存在第二候选GPU则执行步骤B4；Step B3, judging whether there is a second candidate GPU whose shareable resource is greater than or equal to the required resource in each GPU of the host machine; if there is a second candidate GPU, step B4 is performed;

步骤B4、从所述第二候选GPU中选取一个GPU作为目标GPU。Step B4: Select one GPU from the second candidate GPU as the target GPU.

步骤B4具体用于：从所述第二候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Step B4 is specifically configured to: select a GPU whose priority of tasks in execution are higher than that of the new task from the second candidate GPU as the target GPU.

优选地，为确保高优先级的任务能够及时执行，在执行图14所示流程的步骤B3之前，可进一步包括步骤B5～步骤B6，如图15所示：Preferably, in order to ensure that high-priority tasks can be executed in time, before step B3 of the process shown in FIG. 14 is executed, steps B5 to B6 may be further included, as shown in FIG. 15 :

步骤B5、判断宿主机中是否存在优先级低于所述新任务、且分配的资源大于等于所述需求资源的可抢占任务；若存在可抢占任务则执行步骤B6，若不存在可抢占任务则执行步骤B3；Step B5, determine whether there is a preemptible task in the host machine with a priority lower than the new task and the allocated resource is greater than or equal to the required resource; if there is a preemptible task, perform step B6, if there is no preemptible task, then Execute step B3;

步骤B6、从所述可抢占任务中选取一个目标任务，将所述目标任务的资源分配给所述新任务，并将所述新任务分配给所述目标任务所在的宿主机。Step B6: Select a target task from the preemptible tasks, allocate the resources of the target task to the new task, and allocate the new task to the host machine where the target task is located.

优选地，在前述图14、图15所示的流程中，若不存在第二候选GPU，还可进一步包括步骤B7～B8，如图16所示在图15中还进一步包括步骤B7～B8；Preferably, in the aforementioned processes shown in FIG. 14 and FIG. 15 , if there is no second candidate GPU, steps B7 to B8 may be further included. As shown in FIG. 16 , steps B7 to B8 are further included in FIG. 15 ;

步骤B7、判断宿主机的各GPU中是否存在空闲资源与可共享资源的总和大于等于所述需求资源的第三候选GPU；若存在第三候选GPU，则执行步骤B8，若不存在第三候选GPU则将新任务放入阻塞池中等待分配资源；Step B7, judging whether the sum of idle resources and sharable resources in each GPU of the host machine is greater than or equal to the third candidate GPU of the required resources; if there is a third candidate GPU, then perform step B8, if there is no third candidate GPU The GPU puts the new task into the blocking pool and waits for resource allocation;

步骤B8、从所述第三候选GPU中选取一个GPU作为目标GPU。Step B8: Select one GPU from the third candidate GPU as the target GPU.

优选地，步骤B8具体用于：从所述第三候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Preferably, step B8 is specifically configured to: select a GPU whose priority of tasks in execution are higher than that of the new task from the third candidate GPU as the target GPU.

实施例五Embodiment 5

基于前述实施例三提供的一种资源调度装置的相同构思，本发明实施例五提供一种资源调度方法，该方法适用于master-worker模式的分布式计算集群中的worker端宿主机，如图17所示，该方法包括：Based on the same concept of the resource scheduling apparatus provided in the third embodiment, the fifth embodiment of the present invention provides a resource scheduling method, which is applicable to the worker-side host in the distributed computing cluster in the master-worker mode, as shown in the figure 17, the method includes:

步骤201、确定宿主机中各GPU中的可分配资源；Step 201, determine the allocatable resources in each GPU in the host;

步骤202、将各GPU的可分配资源发送给master端服务器；Step 202, sending the allocatable resources of each GPU to the master server;

步骤203、执行master端服务器分配给所述宿主机的任务。Step 203: Execute the task assigned by the master server to the host.

在一个实例中，GPU的可分配资源为GPU中的空闲资源，所述步骤201具体实现如下：监控宿主机中各GPU中未分配给执行中任务的空闲资源，并将空闲资源作为可分配资源。In one example, the allocatable resources of the GPU are idle resources in the GPU, and the specific implementation of step 201 is as follows: monitor the idle resources of each GPU in the host that are not allocated to the tasks in execution, and use the idle resources as the allocatable resources .

在另一个实例中，GPU的可分配资源为GPU中的可共享资源，所述步骤201具体实现可如下：预测宿主机中各GPU中已分配给执行中任务的资源中的在一段时间内未被执行中任务利用的可共享资源，并将可共享资源作为可分配资源。In another example, the allocatable resources of the GPU are sharable resources in the GPU, and the specific implementation of step 201 can be as follows: predicting that among the resources allocated to the executing task in each GPU in the host machine will not be available within a period of time A shareable resource that is utilized by the executing task, and treats the shareable resource as an assignable resource.

又一个实例中，GPU的可分配资源为GPU中的空闲资源和可共享资源，所述步骤201具体实现可如下：监控宿主机中各GPU中未分配给执行中任务的空闲资源，以及，预测宿主机中各GPU中已分配给执行中任务的资源中的在一段时间内未被执行中任务利用的可共享资源，并将所述空闲资源和可共享资源作为可分配资源。In yet another example, the allocatable resources of the GPU are idle resources and shareable resources in the GPU, and the specific implementation of step 201 may be as follows: monitoring the idle resources of each GPU in the host that are not allocated to tasks in execution, and predicting. A shareable resource that has not been used by the executing task for a period of time among the resources that have been allocated to the executing task in each GPU in the host machine, and the idle resource and the shareable resource are regarded as the allocatable resource.

优选地，所述步骤203具体包括：当接收到使用目标GPU的空闲资源执行新任务的第一指令时，利用所述目标GPU的空闲资源执行所述新任务；当接收到使用目标GPU的可共享资源执行新任务的第二指令时，利用所述目标GPU的可共享资源执行所述新任务。Preferably, the step 203 specifically includes: when receiving the first instruction of using the idle resources of the target GPU to execute the new task, using the idle resources of the target GPU to execute the new task; When the shared resource executes the second instruction of the new task, the new task is executed using the sharable resource of the target GPU.

优选地，所述步骤203进一步包括以下步骤：当检测到GPU中的高优先级任务需要使用更多资源时，停止运行所述GPU中的低优先级任务，并将分配给低优先级任务的可共享资源分配给所述高优先级任务。Preferably, the step 203 further includes the following steps: when it is detected that the high-priority task in the GPU needs to use more resources, stop running the low-priority task in the GPU, and allocate the Shareable resources are allocated to the high-priority tasks.

以上结合具体实施例描述了本发明的基本原理，但是，需要指出的是，对本领域普通技术人员而言，能够理解本发明的方法和装置的全部或者任何步骤或者部件可以在任何计算装置(包括处理器、存储介质等)或者计算装置的网络中，以硬件固件、软件或者他们的组合加以实现，这是本领域普通技术人员在阅读了本发明的说明的情况下运用它们的基本编程技能就能实现的。The basic principles of the present invention have been described above with reference to specific embodiments. However, it should be noted that those of ordinary skill in the art can understand that all or any steps or components of the method and device of the present invention may be implemented in any computing device (including processor, storage medium, etc.) or a network of computing devices, implemented in hardware firmware, software, or their combination, which is the ability of those skilled in the art to use their basic programming skills after reading the description of the present invention. achievable.

本领域普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，该程序在执行时，包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and when the program is executed , including one or a combination of the steps of the method embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理模块中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

尽管已描述了本发明的上述实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例做出另外的变更和修改。所以，所附权利要求意欲解释为包括上述实施例以及落入本发明范围的所有变更和修改。Although the above-described embodiments of the present invention have been described, additional changes and modifications to these embodiments may be made by those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the above-described embodiments and all changes and modifications that fall within the scope of the present invention.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.

Claims

1. A resource scheduling method is characterized in that the method is applied to a master server in a distributed computing cluster of a master-worker mode, and the method comprises the following steps:

monitoring the distributable resources of each graphics processing unit GPU in each host machine;

when a new task is received, determining demand resources corresponding to the new task;

determining a target GPU of which the distributable resources meet the required resources according to the distributable resources of each GPU in the host machine;

allocating resources for the new task from the allocable resources of the target GPU, and allocating the new task to the host where the target GPU is located,

the allocable resources comprise idle resources and sharable resources in the GPU, and the sharable resources in the GPU refer to partial resources which are predicted not to be utilized by the executing tasks in a period of time in the resources allocated to the executing tasks in the GPU;

determining a target GPU with distributable resources meeting the demand resources according to the distributable resources of each GPU in the host, which specifically comprises the following steps:

judging whether a first candidate GPU with idle resources more than or equal to the required resources exists in each GPU of the host machine;

if the first candidate GPU exists, selecting one GPU from the first candidate GPUs as a target GPU;

if the first candidate GPU does not exist, judging whether a second candidate GPU with sharable resources more than or equal to the required resources exists in each GPU of the host machine;

if a second candidate GPU exists, selecting one GPU from the second candidate GPUs as a target GPU;

if no second candidate GPU exists, then: judging whether a third candidate GPU with the sum of idle resources and sharable resources larger than or equal to the required resources exists in each GPU of the host machine; and if the third candidate GPUs exist, selecting one GPU from the third candidate GPUs as a target GPU.

2. The method according to claim 1, before determining whether there is a second candidate GPU having sharable resources greater than or equal to the required resources among the GPUs of the host, further comprising:

judging whether a preemptible task with the priority lower than the new task and the allocated resource more than or equal to the required resource exists in the host machine;

if the preemptible tasks exist, selecting a target task from the preemptible tasks, allocating the resources of the target task to the new task, and allocating the new task to a host machine where the target task is located;

and if the preemptible task does not exist, executing the step of judging whether a second candidate GPU with sharable resources more than or equal to the required resources exists in each GPU of the host machine.

3. The method according to claim 1 or 2, wherein selecting one GPU from the second candidate GPUs as the target GPU specifically comprises: selecting one GPU which contains the tasks in execution and has higher priority than the new tasks from the second candidate GPUs as a target GPU;

and/or selecting one GPU from the third candidate GPUs as a target GPU, and specifically comprising the following steps: and selecting one GPU from the third candidate GPUs, wherein the priority of the task in execution is higher than that of the new task, and the selected GPU is used as a target GPU.

4. A resource scheduling device is characterized in that the device is arranged at a master server in a distributed computing cluster of a master-worker mode, and the device comprises:

the monitoring unit is used for monitoring the distributable resources of the GPU in each host;

the analysis unit is used for determining the demand resource corresponding to the new task when receiving the new task;

the determining unit is used for determining a target GPU of which the distributable resources meet the required resources according to the distributable resources of each GPU in the host machine;

an allocating unit, configured to allocate resources to the new task from the allocable resources of the target GPU, and allocate the new task to a host corresponding to the target GPU,

the distributable resources comprise idle resources and sharable resources in the GPU, wherein the sharable resources in the GPU refer to partial resources which are predicted not to be utilized by the executing task in the resources distributed to the executing task in the GPU within a period of time;

the determining unit specifically includes:

the first judgment subunit is used for judging whether a first candidate GPU with idle resources larger than or equal to the required resources exists in each GPU of the host machine, triggering the first determination subunit if the first candidate GPU exists, and triggering the second judgment subunit if the first candidate GPU does not exist;

a first determining subunit, configured to select one GPU from the first candidate GPUs as a target GPU;

the second judgment subunit is used for judging whether a second candidate GPU with sharable resources more than or equal to the required resources exists in each GPU of the host machine; triggering a second determining subunit if a second candidate GPU exists; if the second candidate GPU does not exist, triggering a third judgment subunit;

a second determining subunit, configured to select one GPU from the second candidate GPUs as a target GPU;

the third judgment subunit is used for judging whether a third candidate GPU with the sum of the idle resources and the sharable resources larger than or equal to the required resources exists in each GPU of the host machine, and if the third candidate GPU exists, the third judgment subunit triggers a third determination subunit;

and the third determining subunit is used for selecting one GPU from the third candidate GPUs as a target GPU.

5. The apparatus of claim 4, wherein the first determining subunit is further configured to, before triggering the second determining subunit: judging whether a preemptible task with the priority lower than that of the new task and the allocated resources more than or equal to the required resources exists in the host machine; if the preemptible tasks exist, selecting a target task from the preemptible tasks, allocating the resources of the target task to the new task, and allocating the new task to a host machine where the target task is located; and if the preemptible task does not exist, triggering a second judgment subunit.

6. The apparatus according to claim 4 or 5, wherein the second determining subunit is specifically configured to: selecting a GPU which contains the tasks in execution and has higher priority than the new tasks from the second candidate GPUs as a target GPU;

and/or the third determining subunit is specifically configured to: and selecting one GPU from the third candidate GPUs, wherein the priority of the task in execution is higher than that of the new task, and the selected GPU is used as a target GPU.

7. A resource scheduling system is characterized by comprising a master server and a plurality of worker end host machines which are respectively connected with the master server:

the master server is used for monitoring the distributable resources of each GPU in each host machine; when a new task is received, determining demand resources corresponding to the new task; determining a target GPU of which the distributable resources meet the required resources according to the distributable resources of each GPU in the host machine; distributing resources for the new task from the distributable resources of the target GPU, and distributing the new task to a host machine where the target GPU is located;

the method for determining the target GPU with the distributable resources meeting the demand resources according to the distributable resources of the GPUs in the host machine specifically comprises the following steps:

if second candidate GPUs exist, selecting one GPU from the second candidate GPUs as a target GPU;

if no second candidate GPU exists, then: judging whether a third candidate GPU with the sum of idle resources and sharable resources larger than or equal to the required resources exists in each GPU of the host machine; if a third candidate GPU exists, selecting one GPU from the third candidate GPUs as a target GPU,

the host machine is used for determining the distributable resources of each GPU in the host machine, sending the distributable resources to the master-end server and executing the tasks distributed by the master-end server,

the determining of the allocable resources in each GPU in the host includes:

monitoring idle resources which are not allocated to the tasks in execution in each GPU in the host, predicting sharable resources which are not utilized by the tasks in execution within a period of time in the resources which are allocated to the tasks in execution in each GPU in the host, and taking the idle resources and the sharable resources as allocable resources.