CN109936604B - Resource scheduling method, device and system - Google Patents
Resource scheduling method, device and system Download PDFInfo
- Publication number
- CN109936604B CN109936604B CN201711362963.6A CN201711362963A CN109936604B CN 109936604 B CN109936604 B CN 109936604B CN 201711362963 A CN201711362963 A CN 201711362963A CN 109936604 B CN109936604 B CN 109936604B
- Authority
- CN
- China
- Prior art keywords
- gpu
- resources
- candidate
- task
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Multi Processors (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及计算机领域,特别涉及一种资源调度方法、一种资源调度装置和一种资源调度系统。The invention relates to the field of computers, in particular to a resource scheduling method, a resource scheduling device and a resource scheduling system.
背景技术Background technique
目前,基于master-worker模式的分布式计算集群系统运用越来越广泛(例如docker容器集群),基于master-worker模式的分布式计算集群系统包括master端服务器和多个worker端宿主机。master端服务器用于接收新任务、给新任务分配资源和给worker宿主机分配任务等;worker宿主机用于接收新任务,并执行所述新任务。At present, the distributed computing cluster system based on the master-worker mode is more and more widely used (such as docker container cluster). The distributed computing cluster system based on the master-worker mode includes a master server and multiple worker hosts. The master server is used for receiving new tasks, allocating resources to new tasks, and allocating tasks to worker hosts; the worker hosts are used to receive new tasks and execute the new tasks.
分布式计算集群系统中,master端服务器在给新任务分配资源时,将worker宿主机中的一整块或多块GPU(Graphics Processing Unit,图形处理器)的全部资源分配给同一个任务,即一个任务占用一整块或多块GPU的全部资源。In a distributed computing cluster system, when the master server allocates resources to new tasks, it allocates all the resources of one or more GPUs (Graphics Processing Unit, graphics processors) in the worker host to the same task, that is, A task occupies all the resources of one or more GPUs.
Master端服务器在接收到新任务时,判断worker端宿主机上是否存在未分配给任何一个任务的整块GPU,若不存在则等待执行中任务执行完毕后再给新任务分配一个或多个整块GPU资源。但是,在实际使用中,一个任务往往并不是时时刻刻100%的使用已分配的整块GPU资源,例如可能会在很长一段时间内该任务仅使用了整块GPU中的30%或50%的资源,该GPU中的其他资源处于空闲状态。因此,现有的资源分配方式并不能充分、合理的利用整块GPU的资源,GPU资源利用率较低。When the master server receives a new task, it determines whether there is a whole GPU that is not assigned to any task on the worker host. Block GPU resources. However, in actual use, a task often does not use 100% of the allocated GPU resources all the time. For example, the task may only use 30% or 50% of the entire GPU for a long period of time. % of resources, other resources in this GPU are idle. Therefore, the existing resource allocation method cannot fully and reasonably utilize the resources of the entire GPU, and the utilization rate of GPU resources is low.
发明内容SUMMARY OF THE INVENTION
鉴于上述问题,本发明提供一种资源调度方法、装置和系统,以解决现有技术中GPU资源利用率较低的技术问题。In view of the above problems, the present invention provides a resource scheduling method, device and system to solve the technical problem of low utilization rate of GPU resources in the prior art.
本发明实施例,第一方面提供一种资源调度方法,所述方法应用于master-worker模式的分布式计算集群中的master端服务器,该方法包括:According to an embodiment of the present invention, a first aspect provides a resource scheduling method. The method is applied to a master server in a distributed computing cluster in a master-worker mode, and the method includes:
监控每个宿主机中各GPU的可分配资源;Monitor the allocatable resources of each GPU in each host;
在接收到新任务时,确定新任务对应的需求资源;When a new task is received, determine the required resources corresponding to the new task;
根据宿主机中的各GPU的可分配资源,确定出可分配资源满足所述需求资源的目标GPU;According to the allocatable resources of each GPU in the host machine, determine the target GPU whose allocatable resources meet the required resources;
从所述目标GPU的可分配资源中为所述新任务分配资源,并将所述新任务分配给所述目标GPU所在的宿主机。Allocate resources for the new task from the allocatable resources of the target GPU, and allocate the new task to the host machine where the target GPU is located.
本发明实施例中,第二方面提供一种资源调度方法,该方法适用于master-worker模式的分布式计算集群中的worker端宿主机,方法包括:In the embodiment of the present invention, a second aspect provides a resource scheduling method, the method is applicable to a worker-side host in a distributed computing cluster in a master-worker mode, and the method includes:
确定宿主机中各GPU中的可分配资源;Determine the allocatable resources in each GPU in the host;
将各GPU的可分配资源发送给master端服务器;Send the allocatable resources of each GPU to the master server;
执行所述master端服务器分配的任务。Execute the task assigned by the master server.
本发明实施例中,第三方面提供一种资源调度装置,该装置设置在master-worker模式的分布式计算集群中的master端服务器,装置包括:In the embodiment of the present invention, a third aspect provides a resource scheduling device, the device is set on a master server in a distributed computing cluster in a master-worker mode, and the device includes:
监控单元,用于监控每个宿主机中各GPU的可分配资源;A monitoring unit for monitoring the allocatable resources of each GPU in each host;
解析单元,用于在接收到新任务时,确定新任务对应的需求资源;The parsing unit is used to determine the required resources corresponding to the new task when the new task is received;
确定单元,用于根据宿主机中的各GPU的可分配资源,确定出可分配资源满足所述需求资源的目标GPU;a determining unit, configured to determine a target GPU whose allocatable resources meet the required resources according to the allocatable resources of each GPU in the host;
分配单元,用于从所述目标GPU的可分配资源中为所述新任务分配资源,并将所述新任务分配给所述目标GPU对应的宿主机。an allocation unit, configured to allocate resources for the new task from the allocatable resources of the target GPU, and allocate the new task to a host corresponding to the target GPU.
本发明实施例中,第四方面提供一种资源调度装置,该装置设置在master-worker模式的分布式计算集群中的worker端宿主机中,装置包括:In the embodiment of the present invention, a fourth aspect provides a resource scheduling device, the device is set in a worker-side host in a distributed computing cluster in a master-worker mode, and the device includes:
资源确定单元,用于确定宿主机中各GPU中的可分配资源;a resource determination unit, used to determine the allocatable resources in each GPU in the host;
通信单元,用于将各GPU的可分配资源发送给master端服务器;The communication unit is used to send the allocatable resources of each GPU to the master server;
执行单元,用于执行所述master端服务器分配的任务。The execution unit is used for executing the task assigned by the master server.
本发明实施例中,第五方面提供一种资源调度系统,包括master端服务器和分别与所述master端服务器连接的多个worker端宿主机,其中:In the embodiment of the present invention, a fifth aspect provides a resource scheduling system, including a master server and a plurality of worker hosts respectively connected to the master server, wherein:
master端服务器,用于监控每个宿主机中各GPU的可分配资源;在接收到新任务时,确定新任务对应的需求资源;根据宿主机中的各GPU的可分配资源,确定出可分配资源满足所述需求资源的目标GPU;从所述目标GPU的可分配资源中为所述新任务分配资源,并将所述新任务分配给所述目标GPU对应的宿主机;The master server is used to monitor the allocatable resources of each GPU in each host; when receiving a new task, determine the required resources corresponding to the new task; according to the allocatable resources of each GPU in the host, determine the allocatable resources A target GPU whose resources meet the required resources; allocate resources for the new task from the allocatable resources of the target GPU, and allocate the new task to a host machine corresponding to the target GPU;
宿主机,用于确定宿主机中GPU的可分配资源,并将可分配资源发送给master端服务器,以及执行所述master端服务器分配的任务。The host machine is used to determine the allocatable resources of the GPU in the host machine, send the allocatable resources to the master server, and execute the tasks assigned by the master server.
本发明实施例中,针对master-worker模式的分布式计算集群,master端服务器监控每个宿主机中各GPU的可分配资源;在接收到的新任务时,并不是直接将GPU的整块资源全部分配给新任务,而是按照新任务的需求资源从GPU的可分配资源中分配与需求资源相应大小的资源量。采用本发明技术方案,同一块GPU资源分配给执行中任务之后若还有剩余的可分配资源时,还能将该GPU的可分配资源分配给其他任务使用,从而能够实现多个任务共用同一块GPU的资源,充分利用GPU资源,解决现有技术中一个任务独占整块GPU资源而导致GPU资源利用率低的问题;并且,由于采用本发明技术方案使得与现有技术具有相同的GPU资源量的情况下可以供更多的任务使用,接收到新任务时能够及时为新任务分配资源,整体能够提高任务执行速度和效率。In the embodiment of the present invention, for the distributed computing cluster in the master-worker mode, the master server monitors the allocatable resources of each GPU in each host; when receiving a new task, it does not directly transfer the entire resource of the GPU All are allocated to the new task, but a resource amount corresponding to the required resource is allocated from the GPU's allocatable resources according to the required resources of the new task. By adopting the technical scheme of the present invention, if the same GPU resource is allocated to a task in execution, if there are remaining allocatable resources, the allocatable resources of the GPU can also be allocated to other tasks for use, so that multiple tasks can share the same block GPU resources, make full use of GPU resources, and solve the problem of low utilization rate of GPU resources caused by a task monopolizing a whole block of GPU resources in the prior art; and, due to the adoption of the technical solution of the present invention, the same amount of GPU resources as the prior art is achieved It can be used for more tasks in the case of new tasks, and resources can be allocated to new tasks in time when new tasks are received, which can improve the speed and efficiency of task execution as a whole.
附图说明Description of drawings
附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明,并不构成对本发明的限制。The accompanying drawings are used to provide a further understanding of the present invention, and constitute a part of the specification, and are used to explain the present invention together with the embodiments of the present invention, and do not constitute a limitation to the present invention.
图1为本发明实施例中资源调度系统的结构示意图;1 is a schematic structural diagram of a resource scheduling system in an embodiment of the present invention;
图2为本发明实施例中设置在master端服务器中的资源调度装置的结构示意图之一;FIG. 2 is one of the schematic structural diagrams of the resource scheduling device arranged in the master server in an embodiment of the present invention;
图3为本发明实施例中在资源池中记录的各GPU的可分配资源量的示意图;3 is a schematic diagram of the amount of allocatable resources of each GPU recorded in a resource pool in an embodiment of the present invention;
图4为本发明实施例中设置在master端服务器中的资源调度装置的结构示意图之二;FIG. 4 is the second schematic structural diagram of the resource scheduling apparatus arranged in the master server according to the embodiment of the present invention;
图5为本发明实施例中在任务信息维护单元中维护的宿主机对应的任务信息的示意图;5 is a schematic diagram of task information corresponding to a host machine maintained in a task information maintenance unit in an embodiment of the present invention;
图6为对图5中的任务信息进行更新后的示意图;Fig. 6 is the schematic diagram after the task information in Fig. 5 is updated;
图7为本发明实施例中确定单元的结构示意图之一;7 is one of the schematic structural diagrams of the determination unit in the embodiment of the present invention;
图8为本发明实施例中确定单元的结构示意图之二;FIG. 8 is the second schematic structural diagram of the determination unit in the embodiment of the present invention;
图9为本发明实施例中确定单元的结构示意图之三;9 is a third schematic structural diagram of a determination unit in an embodiment of the present invention;
图10为本发明实施例中设置在worker端宿主机中的资源调度装置的结构示意图;10 is a schematic structural diagram of a resource scheduling apparatus disposed in a worker-side host in an embodiment of the present invention;
图11为本发明实施例中设置在master端服务器中的资源调度方法的流程图;11 is a flowchart of a resource scheduling method set in a master server in an embodiment of the present invention;
图12为实现图11中的步骤103的流程图之一;Fig. 12 is one of the flowcharts for realizing
图13为实现图11中的步骤103的流程图之二;Fig. 13 is the second flow chart for realizing
图14为实现图11中的步骤103的流程图之三;Fig. 14 is the third flow chart for realizing
图15为实现图11中的步骤103的流程图之四;Fig. 15 is the fourth flow chart for realizing
图16为实现图11中的步骤103的流程图之五;Fig. 16 is the fifth flow chart for realizing
图17为本发明实施例中设置在worker端宿主机中的资源调度方法的流程图。FIG. 17 is a flowchart of a resource scheduling method set in a worker-side host in an embodiment of the present invention.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本发明中的技术方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described The embodiments are only some of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
本发明技术方案适用于所有mster-worker模式的分布式计算集群,例如docker容器集群、引擎计算集群等。本申请对于具体的分布式计算集群不做严格限定。The technical solution of the present invention is applicable to all distributed computing clusters in the master-worker mode, such as docker container clusters, engine computing clusters, and the like. This application does not strictly limit the specific distributed computing cluster.
实施例一Example 1
如图1所示为资源调度系统的结构示意图,在资源调度系统为mster-worker模式的分布式计算集群,在分布式计算集群包括master服务器和多个分别与master服务器通信连接的worker端宿主机。Figure 1 is a schematic diagram of the structure of the resource scheduling system. The resource scheduling system is a distributed computing cluster in the master-worker mode. The distributed computing cluster includes a master server and a plurality of worker-side hosts that communicate with the master server. .
Master服务器可通过设置在该master服务器上的master程序实现以下功能:实时或周期性地监控每个宿主机中各GPU的可分配资源;接收新任务,并解析新任务对应的任务参数得到新任务对应的需求资源;根据各宿主机中的各GPU的可分配资源,确定出可分配资源满足所述新任务的需求资源的目标GPU;从目标GPU的可分配资源中为所述新任务分配资源,并将所述新任务分配给所述目标GPU所在的宿主机,以便由该目标GPU所在的宿主机调用相应的worker程序执行所述新任务。The master server can realize the following functions through the master program set on the master server: monitor the allocatable resources of each GPU in each host in real time or periodically; receive new tasks, and parse the task parameters corresponding to the new tasks to obtain new tasks Corresponding required resources; according to the allocatable resources of each GPU in each host, determine a target GPU whose allocatable resources meet the required resources of the new task; allocate resources for the new task from the allocatable resources of the target GPU , and assign the new task to the host machine where the target GPU is located, so that the host machine where the target GPU is located calls the corresponding worker program to execute the new task.
每个worker端宿主机可通过设置在宿主机上的worker程序实现以下功能:实时或周期性地确定该worker程序所在的宿主机上的各GPU的可分配资源,并将各GPU的可分配资源发送给master端服务器,以及执行master服务器分配给该宿主机的任务。Each worker-side host can implement the following functions through the worker program set on the host: determine the allocatable resources of each GPU on the host where the worker program is located in real time or periodically, and assign the allocatable resources of each GPU to the host. Send it to the master server, and execute the tasks assigned to the host by the master server.
本发明实施例中,宿主机将该宿主机上的各GPU对应的可分配资源发送给master端服务器的机制有多种,本申请不做严格限定。例如,worker程序周期性地主动将其所在的宿主机上的各GPU的可分配资源同步给master端服务器;还例如,master端服务器周期性地向各个宿主机发送资源获取请求,各宿主机中的worker程序根据接收到的资源获取请求将其所在宿主机上的各GPU的可分配资源发送给master端服务器;还例如,master端服务器可以周期性地轮询各个宿主机,worker程序在master端服务器轮询到其所在的宿主机时,将该宿主机上的各GPU的可分配资源发送给master端服务器。In the embodiment of the present invention, there are various mechanisms for the host to send the allocatable resources corresponding to each GPU on the host to the master server, which are not strictly limited in this application. For example, the worker program periodically actively synchronizes the allocatable resources of each GPU on its host to the master server; for example, the master server periodically sends resource acquisition requests to each host. The worker program sends the allocatable resources of each GPU on its host to the master server according to the received resource acquisition request; for example, the master server can periodically poll each host, and the worker program is on the master side. When the server polls the host where it is located, it sends the allocatable resources of each GPU on the host to the master server.
为便于本领域技术人员进一步理解本发明技术方案,下面分别从master端服务器和worker端宿主机对本发明技术方案进行详细的描述。In order to facilitate those skilled in the art to further understand the technical solution of the present invention, the technical solution of the present invention will be described in detail below from the master server and the worker host respectively.
实施例二Embodiment 2
master端服务器中的master程序可通过该master的子程序scheduler(即资源调度装置)实现前述功能,该资源调度装置的结构如图2所示,可包括监控单元11、解析单元12、确定单元13和分配单元14,其中:The master program in the master server can implement the aforementioned functions through the subprogram scheduler (ie, the resource scheduling device) of the master. The structure of the resource scheduling device is shown in FIG. and
监控单元11,用于监控每个宿主机中各GPU的可分配资源。The
解析单元12,用于在接收到新任务时,确定新任务对应的需求资源。The parsing
本发明实施例中,解析单元12接收到新任务时,通过预置的解析规则解析新任务对应的任务参数,以得到新任务对应的需求资源,例如任务参数中包含新任务的身份信息(如名称或ID等)、新任务需要的GPU资源信息(GPU资源信息包括GPU张数、占用每张GPU的资源量)。In the embodiment of the present invention, when the parsing
确定单元13,用于根据宿主机中的各GPU的可分配资源,确定出可分配资源满足所述需求资源的目标GPU。The determining
分配单元14,用于从所述目标GPU的可分配资源中为所述新任务分配资源,并将所述新任务分配给所述目标GPU对应的宿主机。The
本发明实施例中,监控单元11监控每个宿主机中各GPU的可分配资源可通过但不仅限于以下方式实现:In this embodiment of the present invention, the
监控单元11建立资源池(即resource pool),在该资源池中动态记录每个宿主机中各GPU的可分配资源量,如图3所示,宿主机(用H1表示)包含3张GPU(分别用H1G1、H1G2、H1G3表示),H1G1、H1G2、H1G3分别对应的可分配资源量为N11、N12、N13。监控单元11在从宿主机中接收该宿主机上各GPU对应的可分配资源时,根据接收到的各GPU对应的可分配资源更新资源池中相应GPU对应的可分配资源量。The
当然本领域技术人员还可以通过其他方式监控每个宿主机中各GPU的可分配资源,例如通过建立动态列表,在该动态列表中记录每个宿主机中各个GPU的可分配资源量的信息,实时或周期性的维护该动态列表中的信息。Of course, those skilled in the art can also monitor the allocatable resources of each GPU in each host in other ways, for example, by establishing a dynamic list, in which the information on the amount of allocatable resources of each GPU in each host is recorded, The information in the dynamic list is maintained in real time or periodically.
本发明实施例中,确定单元13从监控单元11中获取各宿主机中的各GPU的可分配资源,以确定出可分配资源满足所述新任务对应的需求资源的目标GPU。In the embodiment of the present invention, the determining
优选地,为及时更新资源池中的各GPU对应的可分配资源量,分配单元14在从目标GPU的可分配资源中为新任务分配资源后,将目标GPU及其分配给新任务的资源量同步给监控单元11,由监控单元11及时更新该目标GPU的可分配资源量。以图3所示H1G1为目标GPU为例,在从该H1G1的可分配资源中为新任务分配资源之前H1G1的可分配资源量为N11,当分配单元14从目标GPU的可分配资源中为新任务分配量为M1的资源之后,则该目标GPU的可分配资源变为N11-M1。Preferably, in order to update the amount of allocatable resources corresponding to each GPU in the resource pool in time, the
优选地,为进一步及时获知各宿主机中的任务信息,所述资源调度装置还可进一步包括任务信息维护单元15,如图4所示,其中:Preferably, in order to further know the task information in each host in a timely manner, the resource scheduling apparatus may further include a task
任务信息维护单元15,用于记录各宿主机对应的任务信息,其中任务信息包括宿主机上所有的执行中任务、分配给每个执行中任务的GPU资源信息,GPU资源信息包括:执行中任务对应的GPU以及占用每张GPU的资源量。The task
如图5所示,宿主机H1中包含3张GPU(分别用H1G1、H1G2、H1G3表示),宿主机H1中包含两个任务(分别用任务A1和任务A2表示),其中:任务A1对应H1G1、H1G2,H1G1分配给任务A1的资源量为M11,H1G2分配给任务A1的资源量为M12;任务A2对应H1G3,H1G3分配给任务A2的资源量为M21。As shown in Figure 5, the host machine H1 contains 3 GPUs (represented by H1G1, H1G2, and H1G3 respectively), and the host machine H1 contains two tasks (represented by task A1 and task A2 respectively), among which: task A1 corresponds to H1G1 , H1G2, the amount of resources allocated by H1G1 to task A1 is M11, and the amount of resources allocated by H1G2 to task A1 is M12; task A2 corresponds to H1G3, and the amount of resources allocated by H1G3 to task A2 is M21.
优选地,为及时更新各宿主机对应的任务信息,本发明实施例中,分配单元14在从目标GPU的可分配资源中为新任务分配资源后,将目标GPU及其分配给新任务的资源量同步给任务信息维护单元15,以便任务信息维护单元15及时更新目标GPU所在宿主机对应的任务信息。以图5所示的H1G2的为目标GPU为例,新任务用任务3表示,新任务则宿主机H1对应的任务信息如图6所示,新增任务3,任务3对应H1G2,H1G2为任务3分配的资源量为M31。Preferably, in order to update the task information corresponding to each host in time, in this embodiment of the present invention, after allocating resources for the new task from the allocatable resources of the target GPU, the
优选地,本发明实施例中,各宿主机在执行完某一任务之后,释放该任务对应的资源,并将该任务执行完毕的状态信息和该任务占用的资源信息同步给监控单元11和任务信息维护单元15,以便监控单元11、任务信息维护单元15更新信息。Preferably, in the embodiment of the present invention, after each host machine finishes executing a certain task, it releases the resources corresponding to the task, and synchronizes the status information of the task execution completion and the resource information occupied by the task to the
本发明实施例中,GPU的可分配资源可以是GPU中的空闲资源,也可以是GPU中的可共享资源,还可以是GPU中的空闲资源和可共享资源。GPU的空闲资源是指GPU中未分配给执行中任务的资源,GPU的可共享资源是指GPU中已分配给执行中任务的资源中的预测在一段时间内未被执行中任务利用的部分资源。例如,以图5所示的H1G1为例,假设H1G1的资源总量为N1,H1G1目前包含任务A1和任务A2,其中H1G1分配给任务A1的资源量为M11,分配给任务A2的资源量为M12,其中任务A1在一段时间内仅占用M11’的资源量,任务A2在一段时间内仅占用M12’的资源量,则H1G1中的空闲资源为N1-M11-M12,H1G1中的可共享资源包括(M11-M11’)和(M12-M12’)。下面分别以示例1、示例2和示例3进行描述。In this embodiment of the present invention, the allocatable resources of the GPU may be idle resources in the GPU, may also be shareable resources in the GPU, and may also be idle resources and shareable resources in the GPU. The idle resources of the GPU refer to the resources in the GPU that are not allocated to the tasks in execution, and the sharable resources of the GPU refer to the resources that have been allocated to the tasks in execution in the GPU and are predicted to be unused by the tasks in execution for a period of time. . For example, taking H1G1 shown in Figure 5 as an example, assuming that the total amount of resources of H1G1 is N1, H1G1 currently contains task A1 and task A2, and the amount of resources allocated to task A1 by H1G1 is M11, and the amount of resources allocated to task A2 is M12, in which task A1 only occupies the resources of M11' for a period of time, and task A2 only occupies the resources of M12' for a period of time, then the idle resources in H1G1 are N1-M11-M12, the shareable resources in H1G1 Including (M11-M11') and (M12-M12'). Example 1, Example 2 and Example 3 are respectively described below.
示例1Example 1
在示例1中,GPU的可分配资源为GPU中的空闲资源,确定单元13的结构如图7所示,包括判断子单元131和确定子单元132,其中:In Example 1, the allocatable resources of the GPU are idle resources in the GPU, and the structure of the
判断子单元131,用于判断宿主机中的各GPU中是否存在可分配资源大于等于所述需求资源的候选GPU,若存在候选GPU则触发确定子单元132;The
确定子单元132,用于从所述候选GPU中选取其中一个GPU作为目标GPU。A
确定子单元132可以从候选GPU中随机选取一个GPU作为目标GPU,也可以从候选GPU中选取可分配资源最少的GPU作为目标GPU,本申请不作严格限定。The
优选地,判断子单元131确定不存在候选GPU时,若新任务的优先级较高,为确保高优先级任务能够及时执行,前述判断子单元131进一步用于:若不存在候选GPU时,判断宿主机中是否存在优先级低于所述新任务、且分配的资源大于等于所述需求资源的可抢占任务;若存在可抢占任务,则从所述可抢占任务中选取一个目标任务,将所述目标任务的资源分配给所述新任务,并将所述新任务分配给所述目标任务所在的宿主机。若不存在可抢占任务,则将新任务放入预置的阻塞池中等待分配资源。Preferably, when the judging
示例2Example 2
在示例2中,GPU的可分配资源为GPU中的可共享资源。确定单元13的结构可参见图7所示,包括判断子单元131和确定子单元132,其中判断子单元131和确定子单元132的具体功能可参见示例1,在此不再赘述。In Example 2, the allocatable resources of the GPU are sharable resources in the GPU. The structure of the
优选地,由于GPU的可共享资源为该GPU中已经分配给一个执行中任务的资源中的一部分,该执行中任务可能在一段时间之后需要的资源量有所增加,为确保该执行中任务能够顺利执行完毕,本发明实施例中,设定GPU的可共享资源只能分配给优先级比该GPU中任意一个执行中任务都低的新任务,因此,在示例2中,确定子单元132从候选GPU选取一个GPU作为目标GPU,具体为:从所述候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Preferably, since the sharable resources of the GPU are part of the resources in the GPU that have been allocated to an executing task, the amount of resources required by the executing task may increase after a period of time. In order to ensure that the executing task can After the successful execution is completed, in the embodiment of the present invention, the shareable resources of the GPU are set to be allocated only to new tasks with a lower priority than any task in execution in the GPU. Therefore, in Example 2, the
优选地,判断子单元131确定不存在候选GPU时,若新任务的优先级较高,为确保高优先级任务能够及时执行,前述判断子单元131进一步用于:若判断不存在候选GPU时,判断宿主机中是否存在优先级低于所述新任务、且分配的资源大于等于所述需求资源的可抢占任务;若存在可抢占任务,则从所述可抢占任务中选取一个目标任务,将所述目标任务的资源分配给所述新任务,并将所述新任务分配给所述目标任务所在的宿主机。若不存在可抢占任务,则将新任务放入预置的阻塞池中等待分配资源。Preferably, when the judging
本发明实施例中,master程序可以周期性地从阻塞池中选取优先级最高或者放置在阻塞池中时间最长的任务,将选取的任务作为新任务发送给分析单元12。In the embodiment of the present invention, the master program may periodically select the task with the highest priority or the longest time in the blocking pool from the blocking pool, and send the selected task to the
示例3Example 3
在示例3中,GPU的可分配资源为GPU中的空闲资源和可共享资源,所述确定单元13的结构如图8所示,包括第一判断子单元133、第一确定子单元134、第二判断子单元135和第二确定子单元136,其中:In Example 3, the allocatable resources of the GPU are idle resources and sharable resources in the GPU. The structure of the determining
第一判断子单元133,用于判断宿主机的各GPU中是否存在空闲资源大于等于所述需求资源的第一候选GPU,若存在第一候选GPU则触发第一确定子单元134,若不存在第一候选GPU则触发第二判断子单元135;The
第一确定子单元134,用于从所述第一候选GPU中选取一个GPU作为目标GPU;The
第二判断子单元135,用于判断宿主机的各GPU中是否存在可共享资源大于等于所述需求资源的第二候选GPU;若存在第二候选GPU时触发第二确定子单元136;The
第二确定子单元136,用于从所述第二候选GPU中选取一个GPU作为目标GPU。The
优选地,第二确定子单元136具体用于:从所述第二候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Preferably, the
优选地,在第一判断子单元133确定不存在第一候选GPU时,若新任务的优先级较高,为确保高优先级任务能够及时执行,前述第一判断子单元133进一步用于:判断宿主机中是否存在优先级低于所述新任务、且分配的资源大于等于所述需求资源的可抢占任务;若存在可抢占任务,则从所述可抢占任务中选取一个目标任务,将所述目标任务的资源分配给所述新任务,并将所述新任务分配给所述目标任务所在的宿主机;若不存在可抢占任务,则触发第二判断子单元135。Preferably, when the first judging
优选地,为确保新任务能够及时被执行,前述图7所示的确定单元13还可进一步包括第三判断子单元137和第三确定子单元138,如图9所示,其中:Preferably, in order to ensure that the new task can be executed in time, the
所述第二判断子单元135进一步用于:若不存在第二候选GPU时触发第三判断子单元137;The
第三判断子单元137,用于判断宿主机的各GPU中是否存在空闲资源与可共享资源的总和大于等于所述需求资源的第三候选GPU,若存在第三候选GPU则触发第三确定子单元138,若不存在第三候选GPU则将新任务放入预置的阻塞池中等待分配资源;The
第三确定子单元138,用于从所述第三候选GPU中选取一个GPU作为目标GPU。The
优选地,本发明实施例中,第三确定子单元138具体用于:从所述第二候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Preferably, in this embodiment of the present invention, the
实施例三Embodiment 3
本发明实施例三中,worker端宿主机中的worker程序可通过如图10所示的资源调度装置实现,该资源调度装置包括资源确定单元21、通信单元22和执行单元23,其中:In Embodiment 3 of the present invention, the worker program in the worker-side host can be implemented by a resource scheduling device as shown in FIG. 10 . The resource scheduling device includes a
资源确定单元21,用于确定宿主机中各GPU中的可分配资源。The
通信单元22,用于将各GPU的可分配资源发送给master端服务器。The
执行单元23,用于执行master端服务器分配给所述宿主机的任务。The
优选地,本发明实施例三中,GPU的可分配资源可以是GPU的空闲资源,也可以是GPU的可共享资源,还可以是GPU的空闲资源和可共享资源。Preferably, in the third embodiment of the present invention, the allocatable resources of the GPU may be idle resources of the GPU, may also be shareable resources of the GPU, and may also be idle resources and shareable resources of the GPU.
在一个实例中,GPU的可分配资源为GPU的空闲资源,则资源确定单元21具体用于:监控宿主机中各GPU中未分配给执行中任务的空闲资源,并将空闲资源作为可分配资源。In an example, the allocatable resources of the GPU are idle resources of the GPU, and the
在另一个实例中,GPU的可分配资源为GPU的可共享资源,则资源确定单元21具体用于:预测宿主机中各GPU中已分配给执行中任务的资源中在一段时间内未被执行中任务利用的可共享资源,并将可共享资源作为可分配资源。In another example, the allocatable resources of the GPU are the shareable resources of the GPU, and the
在又一个实例中,GPU的可分配资源为GPU的空闲资源和可共享资源,则资源确定单元21具体用于:监控宿主机中各GPU中未分配给执行中任务的空闲资源,以及预测宿主机中各GPU中已分配给执行中任务的资源中在一段时间内未被执行中任务利用的可共享资源,并将所述空闲资源和可共享资源作为可分配资源。In yet another example, the allocatable resources of the GPU are the idle resources and shareable resources of the GPU, and the
本发明实施例中,资源确定单元21预测宿主机中各GPU中已分配给执行中任务的资源中在一段时间内未被执行中任务利用的可共享资源,具体实现可如下:通过监测各GPU中各执行中任务在历史时间段内的资源利用率,预测未来一段时间内该GPU内各执行中任务的资源利用率,将预测未来一段时间内未使用的那部分资源作为可共享资源。例如某一GPU中包含一个执行中任务A,分配给任务A的GPU资源量为M,监控得到该任务A在一段时间T内的资源利用率一直低于50%,则可以预测在下一时间段内该任务A的资源利用率仍然不超付50%,此时,将任务A的GPU资源量M中的50%确认为未来一时间段的可共享资源。In the embodiment of the present invention, the
优选地,所述执行单元23具体用于:在接收到使用目标GPU的空闲资源执行新任务的第一指令时,利用所述目标GPU的空闲资源执行所述新任务;以及,在接收到使用目标GPU的可共享资源执行新任务的第二指令时,利用所述目标GPU的可共享资源执行所述新任务。Preferably, the
优选地,执行单元23进一步用于:当检测到GPU中的高优先级任务需要使用更多资源时,停止运行所述GPU中的低优先级任务,并将分配给低优先级任务的可共享资源分配给所述高优先级任务。Preferably, the
实施例四Embodiment 4
基于前述实施例二所示的资源调度装置,本发明实施例四提供一种资源调度方法,所述方法应用于master-worker模式的分布式计算集群中的master端服务器,方法的流程图如图11所示,包括:Based on the resource scheduling device shown in the second embodiment, the fourth embodiment of the present invention provides a resource scheduling method. The method is applied to a master server in a distributed computing cluster in a master-worker mode. The flow chart of the method is shown in the figure. 11, including:
步骤101、监控每个宿主机中各GPU的可分配资源;
步骤102、在接收到新任务时,确定新任务对应的需求资源;
步骤103、根据宿主机中的各GPU的可分配资源,确定出可分配资源满足所述需求资源的目标GPU;
步骤104、从所述目标GPU的可分配资源中为所述新任务分配资源,并将所述新任务分配给所述目标GPU所在的宿主机。Step 104: Allocate resources for the new task from the allocatable resources of the target GPU, and allocate the new task to the host machine where the target GPU is located.
在一个具体实例中,所述可分配资源为GPU中的空闲资源,或者可分配资源为GPU中的可共享资源,所述步骤103具体实现可如图12所示,包括:In a specific example, the allocatable resources are idle resources in the GPU, or the allocatable resources are sharable resources in the GPU. The specific implementation of
步骤A1、判断宿主机中的各GPU中是否存在可分配资源大于等于所述需求资源的候选GPU;若存在候选GPU则执行步骤A2;Step A1, judging whether there is a candidate GPU that can allocate resources greater than or equal to the required resource in each GPU in the host machine; if there is a candidate GPU, then perform step A2;
步骤A2、从所述候选GPU中选取其中一个GPU作为目标GPU。Step A2: Select one of the GPUs from the candidate GPUs as the target GPU.
优选地,若所述可分配资源为GPU中的可共享资源时,所述步骤A2具体包括:从所述候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Preferably, if the allocatable resource is a sharable resource in the GPU, the step A2 specifically includes: selecting from the candidate GPUs a GPU whose priorities of tasks in execution are higher than those of the new task as a target GPU.
优选地,图11所示的流程图中的步骤A1中进一步包括以下步骤:若不存在候选GPU时执行步骤A3~步骤A5,如图13所示:Preferably, step A1 in the flowchart shown in FIG. 11 further includes the following steps: if there is no candidate GPU, step A3 to step A5 are performed, as shown in FIG. 13 :
步骤A3、判断宿主机中是否存在优先级低于所述新任务、且分配的资源大于等于所述需求资源的可抢占任务;若存在可抢占任务则执行步骤A4,若不存在可抢占任务则执行步骤A5;Step A3, determine whether there is a preemptible task in the host machine with a priority lower than the new task, and the allocated resource is greater than or equal to the required resource; if there is a preemptible task, perform step A4, if there is no preemptible task, then Execute step A5;
步骤A4、从所述可抢占任务中选取一个目标任务,将所述目标任务的资源分配给所述新任务,并将所述新任务分配给所述目标任务所在的宿主机。Step A4: Select a target task from the preemptible tasks, allocate the resources of the target task to the new task, and allocate the new task to the host machine where the target task is located.
步骤A5、将新任务放入预置的阻塞池中等待分配资源。Step A5: Put the new task into the preset blocking pool and wait for resource allocation.
在另一个实例中,可分配资源包括GPU中的空闲资源和可共享资源,所述步骤103具体实现可如图14所示,包括:In another example, the allocatable resources include idle resources and sharable resources in the GPU. The specific implementation of
步骤B1、判断宿主机的各GPU中是否存在空闲资源大于等于所述需求资源的第一候选GPU;若存在第一候选GPU则执行步骤B2,若不存在第一候选GPU则执行步骤B3;Step B1, determine whether there is a first candidate GPU with idle resources greater than or equal to the required resource in each GPU of the host machine; if there is a first candidate GPU, then perform step B2, if there is no first candidate GPU, then perform step B3;
步骤B2、从所述第一候选GPU中选取一个GPU作为目标GPU;Step B2, select a GPU as the target GPU from the first candidate GPU;
步骤B3、判断宿主机的各GPU中是否存在可共享资源大于等于所述需求资源的第二候选GPU;若存在第二候选GPU则执行步骤B4;Step B3, judging whether there is a second candidate GPU whose shareable resource is greater than or equal to the required resource in each GPU of the host machine; if there is a second candidate GPU, step B4 is performed;
步骤B4、从所述第二候选GPU中选取一个GPU作为目标GPU。Step B4: Select one GPU from the second candidate GPU as the target GPU.
步骤B4具体用于:从所述第二候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Step B4 is specifically configured to: select a GPU whose priority of tasks in execution are higher than that of the new task from the second candidate GPU as the target GPU.
优选地,为确保高优先级的任务能够及时执行,在执行图14所示流程的步骤B3之前,可进一步包括步骤B5~步骤B6,如图15所示:Preferably, in order to ensure that high-priority tasks can be executed in time, before step B3 of the process shown in FIG. 14 is executed, steps B5 to B6 may be further included, as shown in FIG. 15 :
步骤B5、判断宿主机中是否存在优先级低于所述新任务、且分配的资源大于等于所述需求资源的可抢占任务;若存在可抢占任务则执行步骤B6,若不存在可抢占任务则执行步骤B3;Step B5, determine whether there is a preemptible task in the host machine with a priority lower than the new task and the allocated resource is greater than or equal to the required resource; if there is a preemptible task, perform step B6, if there is no preemptible task, then Execute step B3;
步骤B6、从所述可抢占任务中选取一个目标任务,将所述目标任务的资源分配给所述新任务,并将所述新任务分配给所述目标任务所在的宿主机。Step B6: Select a target task from the preemptible tasks, allocate the resources of the target task to the new task, and allocate the new task to the host machine where the target task is located.
优选地,在前述图14、图15所示的流程中,若不存在第二候选GPU,还可进一步包括步骤B7~B8,如图16所示在图15中还进一步包括步骤B7~B8;Preferably, in the aforementioned processes shown in FIG. 14 and FIG. 15 , if there is no second candidate GPU, steps B7 to B8 may be further included. As shown in FIG. 16 , steps B7 to B8 are further included in FIG. 15 ;
步骤B7、判断宿主机的各GPU中是否存在空闲资源与可共享资源的总和大于等于所述需求资源的第三候选GPU;若存在第三候选GPU,则执行步骤B8,若不存在第三候选GPU则将新任务放入阻塞池中等待分配资源;Step B7, judging whether the sum of idle resources and sharable resources in each GPU of the host machine is greater than or equal to the third candidate GPU of the required resources; if there is a third candidate GPU, then perform step B8, if there is no third candidate GPU The GPU puts the new task into the blocking pool and waits for resource allocation;
步骤B8、从所述第三候选GPU中选取一个GPU作为目标GPU。Step B8: Select one GPU from the third candidate GPU as the target GPU.
优选地,步骤B8具体用于:从所述第三候选GPU中选取一个包含的执行中任务的优先级均高于新任务的GPU作为目标GPU。Preferably, step B8 is specifically configured to: select a GPU whose priority of tasks in execution are higher than that of the new task from the third candidate GPU as the target GPU.
实施例五Embodiment 5
基于前述实施例三提供的一种资源调度装置的相同构思,本发明实施例五提供一种资源调度方法,该方法适用于master-worker模式的分布式计算集群中的worker端宿主机,如图17所示,该方法包括:Based on the same concept of the resource scheduling apparatus provided in the third embodiment, the fifth embodiment of the present invention provides a resource scheduling method, which is applicable to the worker-side host in the distributed computing cluster in the master-worker mode, as shown in the figure 17, the method includes:
步骤201、确定宿主机中各GPU中的可分配资源;
步骤202、将各GPU的可分配资源发送给master端服务器;
步骤203、执行master端服务器分配给所述宿主机的任务。Step 203: Execute the task assigned by the master server to the host.
在一个实例中,GPU的可分配资源为GPU中的空闲资源,所述步骤201具体实现如下:监控宿主机中各GPU中未分配给执行中任务的空闲资源,并将空闲资源作为可分配资源。In one example, the allocatable resources of the GPU are idle resources in the GPU, and the specific implementation of
在另一个实例中,GPU的可分配资源为GPU中的可共享资源,所述步骤201具体实现可如下:预测宿主机中各GPU中已分配给执行中任务的资源中的在一段时间内未被执行中任务利用的可共享资源,并将可共享资源作为可分配资源。In another example, the allocatable resources of the GPU are sharable resources in the GPU, and the specific implementation of
又一个实例中,GPU的可分配资源为GPU中的空闲资源和可共享资源,所述步骤201具体实现可如下:监控宿主机中各GPU中未分配给执行中任务的空闲资源,以及,预测宿主机中各GPU中已分配给执行中任务的资源中的在一段时间内未被执行中任务利用的可共享资源,并将所述空闲资源和可共享资源作为可分配资源。In yet another example, the allocatable resources of the GPU are idle resources and shareable resources in the GPU, and the specific implementation of
优选地,所述步骤203具体包括:当接收到使用目标GPU的空闲资源执行新任务的第一指令时,利用所述目标GPU的空闲资源执行所述新任务;当接收到使用目标GPU的可共享资源执行新任务的第二指令时,利用所述目标GPU的可共享资源执行所述新任务。Preferably, the
优选地,所述步骤203进一步包括以下步骤:当检测到GPU中的高优先级任务需要使用更多资源时,停止运行所述GPU中的低优先级任务,并将分配给低优先级任务的可共享资源分配给所述高优先级任务。Preferably, the
以上结合具体实施例描述了本发明的基本原理,但是,需要指出的是,对本领域普通技术人员而言,能够理解本发明的方法和装置的全部或者任何步骤或者部件可以在任何计算装置(包括处理器、存储介质等)或者计算装置的网络中,以硬件固件、软件或者他们的组合加以实现,这是本领域普通技术人员在阅读了本发明的说明的情况下运用它们的基本编程技能就能实现的。The basic principles of the present invention have been described above with reference to specific embodiments. However, it should be noted that those of ordinary skill in the art can understand that all or any steps or components of the method and device of the present invention may be implemented in any computing device (including processor, storage medium, etc.) or a network of computing devices, implemented in hardware firmware, software, or their combination, which is the ability of those skilled in the art to use their basic programming skills after reading the description of the present invention. achievable.
本领域普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and when the program is executed , including one or a combination of the steps of the method embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
尽管已描述了本发明的上述实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括上述实施例以及落入本发明范围的所有变更和修改。Although the above-described embodiments of the present invention have been described, additional changes and modifications to these embodiments may be made by those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the above-described embodiments and all changes and modifications that fall within the scope of the present invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711362963.6A CN109936604B (en) | 2017-12-18 | 2017-12-18 | Resource scheduling method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711362963.6A CN109936604B (en) | 2017-12-18 | 2017-12-18 | Resource scheduling method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109936604A CN109936604A (en) | 2019-06-25 |
CN109936604B true CN109936604B (en) | 2022-07-26 |
Family
ID=66982307
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711362963.6A Active CN109936604B (en) | 2017-12-18 | 2017-12-18 | Resource scheduling method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109936604B (en) |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112148468B (en) * | 2019-06-28 | 2023-10-10 | 杭州海康威视数字技术股份有限公司 | Resource scheduling method and device, electronic equipment and storage medium |
CN110362407A (en) * | 2019-07-19 | 2019-10-22 | 中国工商银行股份有限公司 | Computing resource dispatching method and device |
CN110399222B (en) * | 2019-07-25 | 2022-01-21 | 北京邮电大学 | GPU cluster deep learning task parallelization method and device and electronic equipment |
CN110413393B (en) * | 2019-07-26 | 2022-02-01 | 广州虎牙科技有限公司 | Cluster resource management method and device, computer cluster and readable storage medium |
CN110688218B (en) * | 2019-09-05 | 2022-11-04 | 广东浪潮大数据研究有限公司 | Resource scheduling method and device |
CN110688223B (en) * | 2019-09-11 | 2022-07-29 | 深圳云天励飞技术有限公司 | Data processing method and related product |
CN112751694A (en) * | 2019-10-30 | 2021-05-04 | 北京金山云网络技术有限公司 | Management method and device of exclusive host and electronic equipment |
CN110795249A (en) * | 2019-10-30 | 2020-02-14 | 亚信科技(中国)有限公司 | GPU resource scheduling method and device based on MESOS containerized platform |
CN112866321B (en) | 2019-11-28 | 2024-06-18 | 中兴通讯股份有限公司 | A resource scheduling method, device and system |
CN111143060B (en) * | 2019-12-18 | 2021-01-26 | 重庆紫光华山智安科技有限公司 | GPU resource scheduling method and device and GPU |
CN111190712A (en) * | 2019-12-25 | 2020-05-22 | 北京推想科技有限公司 | Task scheduling method, device, equipment and medium |
CN111767134B (en) * | 2020-05-18 | 2024-07-23 | 鹏城实验室 | Multi-task dynamic resource scheduling method |
CN111752706B (en) * | 2020-05-29 | 2024-05-17 | 北京沃东天骏信息技术有限公司 | Resource allocation method, device and storage medium |
KR20220034520A (en) | 2020-09-11 | 2022-03-18 | 삼성전자주식회사 | Processing apparatus, computing apparatus, and operating method of processing apparatus |
CN112035220B (en) * | 2020-09-30 | 2025-02-14 | 北京百度网讯科技有限公司 | Development machine operation task processing method, device, equipment and storage medium |
CN114035941A (en) * | 2021-10-18 | 2022-02-11 | 阿里巴巴(中国)有限公司 | Resource scheduling system, method and computing device |
CN114675976B (en) * | 2022-05-26 | 2022-09-16 | 深圳前海环融联易信息科技服务有限公司 | GPU (graphics processing Unit) sharing method, device, equipment and medium based on kubernets |
CN115658284A (en) * | 2022-09-09 | 2023-01-31 | 阿波罗智联(北京)科技有限公司 | Resource scheduling method, device, electronic equipment and medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9419773B2 (en) * | 2009-11-17 | 2016-08-16 | Sony Corporation | Resource management method and system thereof |
CN107291545A (en) * | 2017-08-07 | 2017-10-24 | 星环信息科技(上海)有限公司 | The method for scheduling task and equipment of multi-user in computing cluster |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ATE411716T1 (en) * | 2005-07-04 | 2008-10-15 | Motorola Inc | DEVICE AND METHOD FOR SHARING EQUIPMENT BETWEEN SEVERAL COMMUNICATION NETWORKS |
US9396032B2 (en) * | 2014-03-27 | 2016-07-19 | Intel Corporation | Priority based context preemption |
CN106155811B (en) * | 2015-04-28 | 2020-01-07 | 阿里巴巴集团控股有限公司 | Resource service device, resource scheduling method and device |
CN107040479B (en) * | 2016-02-04 | 2019-12-17 | 华为软件技术有限公司 | Method and device for adjusting cloud computing resources |
US9916636B2 (en) * | 2016-04-08 | 2018-03-13 | International Business Machines Corporation | Dynamically provisioning and scaling graphic processing units for data analytic workloads in a hardware cloud |
CN106293950B (en) * | 2016-08-23 | 2019-08-13 | 成都卡莱博尔信息技术股份有限公司 | A kind of resource optimization management method towards group system |
CN107357661B (en) * | 2017-07-12 | 2020-07-10 | 北京航空航天大学 | A Fine-Grained GPU Resource Management Method for Mixed Loads |
-
2017
- 2017-12-18 CN CN201711362963.6A patent/CN109936604B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9419773B2 (en) * | 2009-11-17 | 2016-08-16 | Sony Corporation | Resource management method and system thereof |
CN107291545A (en) * | 2017-08-07 | 2017-10-24 | 星环信息科技(上海)有限公司 | The method for scheduling task and equipment of multi-user in computing cluster |
Non-Patent Citations (2)
Title |
---|
"Priority-based cache allocation in throughput processors";Dong Li等;《2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)》;20150309;全文 * |
"面向不同优先级网格作业资源重分配问题的专业技术支持";黄大勇;《万方》;20130320;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109936604A (en) | 2019-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109936604B (en) | Resource scheduling method, device and system | |
US11194626B2 (en) | Dynamic resource allocation based on data transferring to a tiered storage | |
CN109564528B (en) | System and method for computing resource allocation in distributed computing | |
US9027028B2 (en) | Controlling the use of computing resources in a database as a service | |
US10572290B2 (en) | Method and apparatus for allocating a physical resource to a virtual machine | |
US9183016B2 (en) | Adaptive task scheduling of Hadoop in a virtualized environment | |
Wu et al. | Container lifecycle‐aware scheduling for serverless computing | |
CN107018091B (en) | Resource request scheduling method and device | |
CN109726005B (en) | Method, server system and computer readable medium for managing resources | |
US20170031622A1 (en) | Methods for allocating storage cluster hardware resources and devices thereof | |
KR20110075295A (en) | Method and apparatus for allocating unit work on a multicore system | |
CN107301093B (en) | Method and device for managing resources | |
CN107515786A (en) | Resource allocation method, master device, slave device and distributed computing system | |
CN107479976A (en) | A kind of multiprogram example runs lower cpu resource distribution method and device simultaneously | |
CN114371926B (en) | Refined resource allocation method and device, electronic equipment and medium | |
KR102016683B1 (en) | Apparatus and method for autonomic scaling of monitoring function resource in software defined network | |
TWI874647B (en) | Systems and methods for scheduling commands | |
KR101553650B1 (en) | Load Balancing Device and Method in Multicore Systems | |
US8185905B2 (en) | Resource allocation in computing systems according to permissible flexibilities in the recommended resource requirements | |
US20190272461A1 (en) | System and method to dynamically and automatically sharing resources of coprocessor ai accelerators | |
CN113010309B (en) | Cluster resource scheduling method, device, storage medium, equipment and program product | |
CN114546587A (en) | A method for expanding and shrinking capacity of online image recognition service and related device | |
US9928092B1 (en) | Resource management in a virtual machine cluster | |
CN116157778A (en) | System and method for hybrid centralized and distributed scheduling on shared physical host | |
US20240160487A1 (en) | Flexible gpu resource scheduling method in large-scale container operation environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200327 Address after: 101300, No. two, 1 road, Shunyi Park, Zhongguancun science and Technology Park, Beijing, Shunyi District Applicant after: BEIJING TUSEN ZHITU TECHNOLOGY Co.,Ltd. Address before: 101300, No. two, 1 road, Shunyi Park, Zhongguancun science and Technology Park, Beijing, Shunyi District Applicant before: BEIJING TUSEN WEILAI TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 101300, No. two, 1 road, Shunyi Park, Zhongguancun science and Technology Park, Beijing, Shunyi District Patentee after: Beijing Original Generation Technology Co.,Ltd. Country or region after: China Address before: 101300, No. two, 1 road, Shunyi Park, Zhongguancun science and Technology Park, Beijing, Shunyi District Patentee before: BEIJING TUSEN ZHITU TECHNOLOGY Co.,Ltd. Country or region before: China |