CN116401062B

CN116401062B - A server-less resource processing method, device and electronic equipment

Info

Publication number: CN116401062B
Application number: CN202310392348.9A
Authority: CN
Inventors: 金鑫; 刘譞哲; 吴秉阳; 章梓立
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-04-13
Filing date: 2023-04-13
Publication date: 2023-09-12
Anticipated expiration: 2043-04-13
Also published as: CN116401062A

Abstract

The disclosure provides a server non-perception resource processing method, a device and electronic equipment, relates to the technical field of computers, and aims to solve the technical problem that the utilization rate of a graphic processor is low and improve the utilization rate of the graphic processor under the condition that a server does not perceive. The method comprises the following steps: acquiring a first function call of a resource guarantee type task aiming at a target graphic processor, and acquiring the arrival rate of the first function call; distributing video memory for the first function call from the unified video memory address space according to the arrival rate of the first function call; acquiring a second function call of the opportunistic task aiming at any graphic processor, and acquiring the arrival rate of the second function call; acquiring a predetermined target rate; determining the leaving rate of the second function call according to the arrival rate and the target rate of the first function call; and distributing the video memory or the main memory for the second function call from the unified video memory address space according to the leaving rate of the second function call.

Description

A server-less resource processing method, device and electronic equipment

技术领域Technical field

本公开涉及计算机技术领域，特别是涉及一种服务器无感知资源处理方法、装置和电子设备。The present disclosure relates to the field of computer technology, and in particular to a server-insensitive resource processing method, apparatus and electronic equipment.

背景技术Background technique

容器技术已经作为一种资源管理手段在数据中心内得到了广泛使用。容器技术提供了轻量级虚拟化能力，显著降低了数据中心管理和部署的复杂度和成本。Container technology has been widely used in data centers as a resource management method. Container technology provides lightweight virtualization capabilities, significantly reducing the complexity and cost of data center management and deployment.

深度学习(DL，Deep Learning)是一种数据中心新兴负载。随着最近深度神经网络(DNN，Deep Neural Networks)相关研究的技术突破，深度学习模型正日渐集成进应用程序和在线服务中。大型企业也构建了多组户共享GPU(Graphics Processing Unit，图形处理器)集群，以便多个团队同时开发训练深度学习模型。Deep learning (DL, Deep Learning) is an emerging load in data centers. With recent technological breakthroughs in deep neural network (DNN)-related research, deep learning models are increasingly integrated into applications and online services. Large enterprises have also built multi-user shared GPU (Graphics Processing Unit, graphics processor) clusters so that multiple teams can develop and train deep learning models at the same time.

为了在容器云中支持深度学习负载，通常将整块GPU静态分配给容器。在GPU被分配给一个容器后，该容器就获得该GPU的独占访问权限。即使该GPU没有被完全利用甚至完全空闲时，其他位于同一机器的容器也不能使用该GPU，以此保证性能隔离。性能隔离对生产部署至关重要，当一个容器内的深度学习训练任务独享GPU时，该深度学习训练任务便不会被其他任务干扰，因而它的吞吐也是可预测的。To support deep learning workloads in container clouds, an entire GPU is usually statically allocated to the container. After a GPU is assigned to a container, the container gets exclusive access to the GPU. Even if the GPU is not fully utilized or even completely idle, other containers located on the same machine cannot use the GPU to ensure performance isolation. Performance isolation is crucial for production deployment. When a deep learning training task in a container has its own GPU, the deep learning training task will not be interfered by other tasks, so its throughput is also predictable.

然而，GPU的独占分配策略会导致很低的GPU利用率，即使很多GPU没有被充分利用，一些任务可能也只能在等待队列中等待被调度，从而导致了更长的任务完成时间。相关研究表明，一个生产环境下GPU集群中GPU的平均利用率仅52％；另一研究表明，另一个生产环境下GPU集群中GPU的利用率的中位数不高于10％。However, the exclusive allocation strategy of GPU will lead to very low GPU utilization. Even if many GPUs are not fully utilized, some tasks may only be waiting in the waiting queue to be scheduled, resulting in longer task completion time. Related research shows that the average utilization of GPUs in a GPU cluster in one production environment is only 52%; another study shows that the median utilization of GPUs in a GPU cluster in another production environment is no higher than 10%.

这是一个在生产环境下GPU集群的已知问题，这一问题可以通过共享GPU，提高GPU利用率来解决。相关技术的GPU共享方案分为应用层解决方案和系统层解决方案两类。AntMan(一种来自阿里云人工智能凭条的框架层GPU共享方案)是目前最前沿的应用层解决方案。AntMan可以提供较高的GPU利用率和性能隔离，然而它需要对深度学习框架进行复杂的修改，并且限制容器用户只能使用特定版本的深度学习框架。因此这种方案不是服务器无感知的，用户必须在软件开发和配置时考虑到相应的深度学习框架兼容性的问题。MPS(Multi-Process Server，多进程服务器)则是一种系统层解决方案。然而，MPS也不完全是服务器无感知的，用户需要在掌握应用知识的情况下设置资源限制来保证性能隔离，并且不支持在显存超售的情况下实现GPU共享。同时，它会将多个CUDA(Compute UnifiedDevice Architecture，一种新的操作GPU计算的硬件和软件架构)上下文合并为一个，这导致了任务间存在无法容忍的错误共享情况。This is a known issue with GPU clusters in production environments. This issue can be resolved by sharing the GPU and improving GPU utilization. The GPU sharing solutions of related technologies are divided into two categories: application layer solutions and system layer solutions. AntMan (a framework-layer GPU sharing solution from Alibaba Cloud Artificial Intelligence Credentials) is currently the most cutting-edge application layer solution. AntMan can provide high GPU utilization and performance isolation, but it requires complex modifications to the deep learning framework and restricts container users to only use specific versions of the deep learning framework. Therefore, this solution is not server-agnostic, and users must consider the compatibility of the corresponding deep learning framework when developing and configuring software. MPS (Multi-Process Server, multi-process server) is a system layer solution. However, MPS is not completely server-agnostic. Users need to set resource limits with application knowledge to ensure performance isolation, and it does not support GPU sharing when the video memory is oversubscribed. At the same time, it will merge multiple CUDA (Compute UnifiedDevice Architecture, a new hardware and software architecture for operating GPU computing) contexts into one, which leads to intolerable error sharing between tasks.

发明内容Contents of the invention

鉴于上述问题，本公开实施例提供了一种服务器无感知资源处理方法、装置和电子设备，以便克服上述问题或者至少部分地解决上述问题。In view of the above problems, embodiments of the present disclosure provide a server-insensitive resource processing method, device, and electronic device to overcome the above problems or at least partially solve the above problems.

本公开实施例的第一方面，提供了一种服务器无感知资源处理方法，包括：A first aspect of an embodiment of the present disclosure provides a server-insensitive resource processing method, including:

获取资源保障型任务针对目标图形处理器的第一函数调用，并获取所述第一函数调用的到达速率，所述目标图形处理器为：图形处理器集群包括的多个图形处理器之一，所述第一函数调用的到达速率为：所述第一函数调用到达速率监控器的速率；Obtain the first function call of the resource-guaranteed task for the target graphics processor, and obtain the arrival rate of the first function call. The target graphics processor is: one of multiple graphics processors included in the graphics processor cluster, The arrival rate of the first function call is: the arrival rate of the first function call at the rate monitor;

根据所述第一函数调用的到达速率，从统一显存地址空间中，为所述第一函数调用分配显存，所述统一显存地址空间包括：所述多个图形处理器的显存和所述多个图形处理器的主存；According to the arrival rate of the first function call, allocate video memory for the first function call from a unified video memory address space. The unified video memory address space includes: the video memory of the multiple graphics processors and the multiple graphics processors. The main memory of the graphics processor;

获取机会型任务针对任一所述图形处理器的第二函数调用，并获取所述第二函数调用的到达速率；Obtain the second function call of the opportunistic task for any of the graphics processors, and obtain the arrival rate of the second function call;

获取预先确定的目标速率，所述目标速率为：所述资源保障型任务独占所述目标图形处理器显存时，针对所述目标图形处理器的函数调用的到达速率；Obtain a predetermined target rate, which is: the arrival rate of function calls to the target graphics processor when the resource guarantee task exclusively occupies the video memory of the target graphics processor;

根据所述第一函数调用的到达速率和所述目标速率，确定所述第二函数调用的离开速率，所述第二函数调用的离开速率为：所述第二函数调用离开速率控制器的速率；According to the arrival rate of the first function call and the target rate, the departure rate of the second function call is determined, and the departure rate of the second function call is: the rate of the second function call leaving the rate controller ;

根据所述第二函数调用的离开速率，从所述统一显存地址空间中，为所述第二函数调用分配显存或主存。According to the departure rate of the second function call, video memory or main memory is allocated to the second function call from the unified video memory address space.

可选地，所述根据所述第一函数调用的到达速率和所述目标速率，确定所述第二函数调用的离开速率，包括：Optionally, determining the departure rate of the second function call based on the arrival rate of the first function call and the target rate includes:

在保证所述第一函数调用的到达速率大于或等于所述目标速率的情况下，确定第二函数调用的离开速率的最大值；Under the condition that the arrival rate of the first function call is guaranteed to be greater than or equal to the target rate, determine the maximum value of the departure rate of the second function call;

将所述第二函数调用的离开速率的最大值，确定为所述第二函数调用的离开速率。The maximum value of the departure rate of the second function call is determined as the departure rate of the second function call.

可选地，所述在保证所述第一函数调用的到达速率大于或等于所述目标速率的情况下，确定第二函数调用的离开速率的最大值，包括：Optionally, determining the maximum value of the departure rate of the second function call while ensuring that the arrival rate of the first function call is greater than or equal to the target rate includes:

在所述第一函数调用的到达速率大于或等于所述目标速率的情况下，线性增大所述第二函数调用的离开速率，得到所述第二函数调用的离开速率的最大值；When the arrival rate of the first function call is greater than or equal to the target rate, linearly increase the departure rate of the second function call to obtain the maximum value of the departure rate of the second function call;

在所述第一函数调用的到达速率小于所述目标速率的情况下，乘性减小所述第二函数调用的当前离开速率，直到所述第一函数调用的到达速率不小于所述目标速率，得到所述第二函数调用的离开速率的最大值。If the arrival rate of the first function call is less than the target rate, multiply the current departure rate of the second function call until the arrival rate of the first function call is not less than the target rate. , obtain the maximum value of the departure rate of the second function call.

可选地，所述根据所述第一函数调用的到达速率，从统一显存地址空间中，为所述第一函数调用分配显存，包括：Optionally, allocating video memory to the first function call from a unified video memory address space according to the arrival rate of the first function call includes:

根据所述第一函数调用的到达速率，确定所述第一函数调用所需的第一显存大小；Determine the first video memory size required for the first function call according to the arrival rate of the first function call;

在所述统一显存地址空间中空闲显存的大小，小于所述第一显存大小的情况下，将占用显存的所述机会型任务迁移到主存中，得到大小不小于所述第一显存大小的空闲显存；When the size of the free video memory in the unified video memory address space is smaller than the size of the first video memory, the opportunistic tasks occupying the video memory are migrated to the main memory to obtain a size that is not smaller than the size of the first video memory. Free video memory;

为所述第一函数调用分配显存。Allocate video memory for the first function call.

可选地，所述根据所述第二函数调用的离开速率，从所述统一显存地址空间中，为所述第二函数调用分配显存或主存，包括：Optionally, allocating video memory or main memory to the second function call from the unified video memory address space according to the departure rate of the second function call includes:

根据所述第二函数调用的离开速率，确定所述第二函数调用所需的第二显存大小；Determine the second video memory size required for the second function call according to the departure rate of the second function call;

在所述统一显存地址空间中空闲显存的大小，不小于所述第二显存大小的情况下，从所述统一显存地址空间中，为所述第二函数调用分配显存；When the size of free video memory in the unified video memory address space is not less than the size of the second video memory, allocate video memory for the second function call from the unified video memory address space;

在所述统一显存地址空间中空闲显存的大小，小于所述第二显存大小的情况下，从所述统一显存地址空间中，为所述第二函数调用分配主存。When the size of free video memory in the unified video memory address space is smaller than the size of the second video memory, main memory is allocated for the second function call from the unified video memory address space.

可选地，在所述获取资源保障型任务针对目标图形处理器的第一函数调用之前，还包括：Optionally, before the first function call of the resource-guaranteed task for the target graphics processor is obtained, it also includes:

确定所述多个图形处理器的显存和所述多个图形处理器的主存；Determine the video memory of the multiple graphics processors and the main memory of the multiple graphics processors;

根据所述多个图形处理器的显存和所述多个图形处理器的主存，确定所述统一显存地址空间。The unified display memory address space is determined based on the display memories of the multiple graphics processors and the main memories of the multiple graphics processors.

将所述资源保障型任务独占所述目标图形处理器时，针对所述目标图形处理器的函数调用的到达速率，确定为所述目标速率。When the resource-guaranteed task exclusively occupies the target graphics processor, the arrival rate of function calls to the target graphics processor is determined as the target rate.

可选地，所述从统一显存地址空间中，为所述第一函数调用分配显存，包括：Optionally, allocating video memory for the first function call from the unified video memory address space includes:

从所述统一显存地址空间中，为所述第一函数调用分配虚拟第一物理地址；Allocate a virtual first physical address to the first function call from the unified video memory address space;

在所述资源保障型任务访问所述第一虚拟物理地址出错时，从所述统一显存地址空间中，为所述第一函数调用分配显存的第一真实物理地址；When the resource-guaranteed task makes an error in accessing the first virtual physical address, allocate the first real physical address of the video memory from the unified video memory address space to the first function call;

所述从所述统一显存地址空间中，为所述第二函数调用分配显存或主存，包括：Allocating video memory or main memory for the second function call from the unified video memory address space includes:

从所述统一显存地址空间中，为所述第二函数调用分配虚拟第二物理地址；Allocate a virtual second physical address to the second function call from the unified video memory address space;

在所述机会型任务访问所述第二虚拟物理地址出错时，从所述统一显存地址空间中，为所述第二函数调用分配显存的第二真实物理地址或主存的第二真实物理地址。When an error occurs when the opportunistic task accesses the second virtual physical address, a second real physical address of the video memory or a second real physical address of the main memory is allocated to the second function call from the unified video memory address space. .

本公开实施例的第二方面，提供了一种服务器无感知资源处理装置，包括：A second aspect of the embodiment of the present disclosure provides a server-insensitive resource processing device, including:

第一获取模块，用于获取资源保障型任务针对目标图形处理器的第一函数调用，并获取所述第一函数调用的到达速率，所述目标图形处理器为：图形处理器集群包括的多个图形处理器之一，所述第一函数调用的到达速率为：所述第一函数调用到达速率监控器的速率；The first acquisition module is used to acquire the first function call of the resource-guaranteed task to the target graphics processor, and obtain the arrival rate of the first function call. The target graphics processor is: a multi-purpose graphics processor included in the graphics processor cluster. One of the graphics processors, the arrival rate of the first function call is: the arrival rate of the first function call at the rate monitor;

第一分配模块，用于根据所述第一函数调用的到达速率，从统一显存地址空间中，为所述第一函数调用分配显存，所述统一显存地址空间包括：所述多个图形处理器的显存和所述多个图形处理器的主存；A first allocation module, configured to allocate video memory to the first function call from a unified video memory address space according to the arrival rate of the first function call. The unified video memory address space includes: the multiple graphics processors The video memory and the main memory of the multiple graphics processors;

第二获取模块，用于获取机会型任务针对任一所述图形处理器的第二函数调用，并获取所述第二函数调用的到达速率；A second acquisition module, configured to acquire the second function call of the opportunistic task for any of the graphics processors, and acquire the arrival rate of the second function call;

目标获取模块，用于获取预先确定的目标速率，所述目标速率为：所述资源保障型任务独占所述目标图形处理器显存时，针对所述目标图形处理器的函数调用的到达速率；A target acquisition module, configured to acquire a predetermined target rate, which is the arrival rate of function calls to the target graphics processor when the resource guarantee task exclusively occupies the video memory of the target graphics processor;

速率确定模块，用于根据所述第一函数调用的到达速率和所述目标速率，确定所述第二函数调用的离开速率，所述第二函数调用的离开速率为：所述第二函数调用离开速率控制器的速率；A rate determination module, configured to determine the departure rate of the second function call according to the arrival rate of the first function call and the target rate, where the departure rate of the second function call is: the second function call rate leaving the rate controller;

第二分配模块，用于根据所述第二函数调用的离开速率，从所述统一显存地址空间中，为所述第二函数调用分配显存或主存。The second allocation module is configured to allocate video memory or main memory to the second function call from the unified video memory address space according to the departure rate of the second function call.

本公开实施例的第三方面，提供了一种电子设备，包括：处理器；用于存储所述处理器可执行指令的存储器；其中，所述处理器被配置为执行所述可执行指令，以实现如第一方面的服务器无感知资源处理方法。A third aspect of an embodiment of the present disclosure provides an electronic device, including: a processor; a memory for storing executable instructions by the processor; wherein the processor is configured to execute the executable instructions, To implement the server-insensitive resource processing method of the first aspect.

本公开实施例的第四方面，提供了一种计算机可读存储介质，当计算机可读存储介质中的指令由电子设备的处理器执行时，使得所述电子设备能够执行如第一方面的服务器无感知资源处理方法。A fourth aspect of the embodiments of the present disclosure provides a computer-readable storage medium. When the instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device can execute the server of the first aspect. Unaware resource handling methods.

本公开实施例包括以下优点：Embodiments of the present disclosure include the following advantages:

本公开实施例中，可以将物理机上的多个图形处理器的显存和主存统一到统一显存地址空间，然后对统一显存地址空间进行分配，实现了资源保障型任务和机会型任务对GPU的共享，在服务器无感知的情况下，提高GPU的利用率。其中，为资源保障型任务分配的地址空间为显存，保证了资源保障型任务的性能。根据第一函数调用的到达速率和目标速率，确定第二函数调用的离开速率，可以保证在资源保障型任务不受影响的情况下，提升机会型任务的性能。如此，本公开实施例实现了在服务器无感知、不修改任务框架、保证性能隔离、无需将多个CUDA上下文合并为一个的情况下，提升了GPU的利用率。In the embodiment of the present disclosure, the video memory and main memory of multiple graphics processors on a physical machine can be unified into a unified video memory address space, and then the unified video memory address space can be allocated, thereby realizing resource guarantee tasks and opportunistic tasks on the GPU. Sharing improves GPU utilization without the server being aware of it. Among them, the address space allocated for resource-guaranteed tasks is video memory, which ensures the performance of resource-guaranteed tasks. Determining the departure rate of the second function call based on the arrival rate and target rate of the first function call can ensure that the performance of the opportunistic task is improved without affecting the resource guarantee task. In this way, the embodiments of the present disclosure improve GPU utilization without being aware of the server, without modifying the task framework, ensuring performance isolation, and without merging multiple CUDA contexts into one.

附图说明Description of drawings

为了更清楚地说明本公开实施例的技术方案，下面将对本公开实施例的描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本公开的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments of the present disclosure will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present disclosure. , for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without exerting creative labor.

图1是本公开实施例中TGS的架构示意图；Figure 1 is a schematic diagram of the architecture of TGS in an embodiment of the present disclosure;

图2是本公开实施例中一种服务器无感知资源处理方法的步骤流程图；Figure 2 is a step flow chart of a server-insensitive resource processing method in an embodiment of the present disclosure;

图3是本公开实施例中自适应速率控制的流程示意图；Figure 3 is a schematic flowchart of adaptive rate control in an embodiment of the present disclosure;

图4是本公开实施例中第一函数调用的到达速率和第二函数调用的离开速率之间的关系示意图；Figure 4 is a schematic diagram of the relationship between the arrival rate of the first function call and the departure rate of the second function call in an embodiment of the present disclosure;

图5是本公开实施例中一种服务器无感知资源处理装置的结构示意图。Figure 5 is a schematic structural diagram of a server-less resource processing device in an embodiment of the present disclosure.

具体实施方式Detailed ways

为使本公开的上述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本公开作进一步详细的说明。In order to make the above objects, features and advantages of the present disclosure more obvious and understandable, the present disclosure will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

资源保障型任务能够获得类似独占GPU的性能保证，机会型任务则是尽可能的利用GPU空余资源，没有性能保证。通过让这两类任务共享GPU，可以提高GPU利用率。Resource-guaranteed tasks can obtain performance guarantees similar to those of exclusive GPU, while opportunistic tasks use GPU spare resources as much as possible without performance guarantees. By letting these two types of tasks share the GPU, GPU utilization can be improved.

本公开实施例所指的资源保障型任务，发送的每个针对图形处理器的函数调用，请求的图形处理器的地址空间大小接近，在同一大小范围内。因此，根据资源保障型任务发出的针对图形处理器的函数调用的速率，即可估算资源保障型任务所需的地址空间大小。For the resource guarantee task referred to in the embodiment of the present disclosure, for each function call sent to the graphics processor, the address space size of the requested graphics processor is close to the same size range. Therefore, the size of the address space required by a resource-guaranteed task can be estimated based on the rate at which the resource-guaranteed task issues function calls to the graphics processor.

同理，本公开实施例所指的机会型任务，发送的每个针对图形处理器的函数调用，请求的图形处理器的地址空间大小接近，在同一大小范围内。因此，根据机会型任务发出的针对图形处理器的函数调用的速率，即可估算机会型任务所需的地址空间大小。Similarly, for each function call directed to the graphics processor sent by the opportunistic task referred to in the embodiment of the present disclosure, the address space size of the requested graphics processor is close to the same size range. Therefore, the size of the address space required by an opportunistic task can be estimated based on the rate at which the opportunistic task issues function calls to the graphics processor.

在生产环境中，深度学习训练任务通常可以被分为两类：资源保障型任务和机会型任务。深度学习训练任务每次发出的针对图形处理器的函数的函数调用，请求的图形处理器的地址空间大小接近，在同一大小范围内。因此，本公开实施例所指的资源保障型任务和机会型任务可以为深度学习训练任务，或其它根据发出函数调用的速率即可估算所需地址空间大小的任务。In a production environment, deep learning training tasks can usually be divided into two categories: resource guarantee tasks and opportunistic tasks. Each time a deep learning training task issues a function call to a function on a graphics processor, the address space size of the requested graphics processor is close to the same size range. Therefore, the resource guarantee tasks and opportunistic tasks referred to in the embodiments of the present disclosure may be deep learning training tasks, or other tasks that can estimate the required address space size based on the rate at which function calls are issued.

本公开实施例提出了一种透明无感知GPU共享系统，并命名为TGS，可以利用TGS执行本公开实施例提出的一种服务器无感知资源处理方法。TGS位于GPU和容器之间，容器和容器内应用对TGS是无感知的。每个GPU都和普通的GPU一样展现给容器。用户可以使用任何框架和工具来开发和训练深度学习模型。TGS可以劫持从容器中发出的跟GPU相关的函数调用，并且通过控制这些函数调用来控制每个容器对GPU的资源的使用情况。The embodiment of the present disclosure proposes a transparent non-aware GPU sharing system, named TGS. The TGS can be used to execute a server non-aware resource processing method proposed in the embodiment of the present disclosure. TGS is located between the GPU and the container, and the container and the applications in the container are unaware of TGS. Each GPU is exposed to the container like a normal GPU. Users can use any framework and tool to develop and train deep learning models. TGS can hijack GPU-related function calls issued from the container and control the usage of GPU resources by each container by controlling these function calls.

如图1所示，TGS可以包括速率监控器、速率控制器和统一显存模块。速率监控器可以获取每个资源保障型任务和每个机会型任务发出针对任一图形处理器的函数调用的速率，速率控制器可以控制机会型任务的离开速率，统一显存模块可以将物理机上的多个图形处理器的显存和多个图形处理器的主存统一为统一显存地址空间。As shown in Figure 1, TGS can include a rate monitor, a rate controller, and a unified memory module. The rate monitor can obtain the rate at which each resource-guaranteed task and each opportunistic task issues function calls for any graphics processor. The rate controller can control the departure rate of opportunistic tasks. The unified memory module can transfer the number of functions on the physical machine to The video memory of multiple graphics processors and the main memory of multiple graphics processors are unified into a unified video memory address space.

容器中的应用的应用代码都是被封装成函数的形式在GPU上执行的，这种代码被称为GPU kernel(GPU内核)。GPU kernel都是根据GPU体系结构和执行模型高度优化的。一个小的深度神经网络模型训练任务可能不会使用完一个GPU上的所有计算资源。在这种情况下，如果一个GPU被一个容器独占，那么就会造成GPU的低利用率。而TGS使得一个GPU可以被多个容器共享使用，以此来提高GPU利用率。The application code of the application in the container is encapsulated into a function and executed on the GPU. This code is called GPU kernel. GPU kernels are highly optimized based on the GPU architecture and execution model. A small deep neural network model training task may not use all the computing resources on a GPU. In this case, if a GPU is monopolized by a container, low utilization of the GPU will result. TGS allows one GPU to be shared by multiple containers to improve GPU utilization.

TGS可以区分资源保障型任务和机会型任务。TGS可以保证资源保障型任务不会被机会型任务所影响，机会型任务只会使用资源保障型任务空余的GPU计算资源。TGS can distinguish between resource guarantee tasks and opportunity tasks. TGS can ensure that resource-guaranteed tasks will not be affected by opportunistic tasks, and opportunistic tasks will only use the spare GPU computing resources of resource-guaranteed tasks.

GPU拥有独立于主存的显存资源。现代GPU的显存大小从几GB到数十GB不等。GPU显存负责存储应用用于GPU计算的状态和数据。GPU计算单元访问显存的速率远快于主存。与GPU计算资源类似，当单个容器无法使用完所有GPU显存时，GPU显存可以被多个容器共享。GPU has video memory resources independent of main memory. Modern GPUs have video memory sizes ranging from a few GB to tens of GB. GPU memory is responsible for storing the state and data used by applications for GPU calculations. The GPU computing unit accesses video memory much faster than main memory. Similar to GPU computing resources, when a single container cannot use all the GPU memory, the GPU memory can be shared by multiple containers.

参照图2所示，示出了本公开实施例中一种服务器无感知资源处理方法的步骤流程图，如图2所示，该服务器无感知资源处理方法具体可以包括步骤S11～步骤S16。Referring to FIG. 2 , a flow chart of steps of a server-insensitive resource processing method in an embodiment of the present disclosure is shown. As shown in FIG. 2 , the server-insensitive resource processing method may specifically include steps S11 to S16 .

步骤S11：获取资源保障型任务针对目标图形处理器的第一函数调用，并获取所述第一函数调用的到达速率。Step S11: Obtain the first function call of the resource-guaranteed task directed to the target graphics processor, and obtain the arrival rate of the first function call.

所述目标图形处理器为：图形处理器集群包括的多个图形处理器之一。所述第一函数调用的到达速率为：所述第一函数调用到达速率监控器的速率。The target graphics processor is one of multiple graphics processors included in a graphics processor cluster. The arrival rate of the first function call is: the arrival rate of the first function call at the rate monitor.

一个资源保障型任务针对任一图形处理器的一个第一函数调用的发出速率、到达速率和离开速率都相同。其中，发出速率是指该资源保障型任务发出该第一函数调用的速率；到达速率是指该第一函数调用到达速率监控器的速率；离开速率是指第一函数调用离开速率控制器的速率。第一函数调用离开速率控制器，表征该第一函数调用将调用图形处理器。为了保证资源保障型任务的性能，因此只监视而并不限制资源保障型任务的速率，因此一个资源保障型任务针对任一图形处理器的一个第一函数调用的发出速率、到达速率和离开速率都相同。The issue, arrival, and departure rates of a resource-guaranteed task's first function call to either graphics processor are the same. Among them, the issuing rate refers to the rate at which the resource-guaranteed task issues the first function call; the arrival rate refers to the rate at which the first function call reaches the rate monitor; the departure rate refers to the rate at which the first function call leaves the rate controller. . The first function call leaves the rate controller, indicating that the first function call will call the graphics processor. In order to ensure the performance of resource-guaranteed tasks, the rate of resource-guaranteed tasks is only monitored but not limited. Therefore, the issue rate, arrival rate and departure rate of a resource-guaranteed task for a first function call of any graphics processor All the same.

目标图形处理器可以为GPU集群包括的多个图形处理器中的任一图形处理器。The target graphics processor may be any graphics processor among multiple graphics processors included in the GPU cluster.

步骤S12：根据所述第一函数调用的到达速率，从统一显存地址空间中，为所述第一函数调用分配显存。Step S12: According to the arrival rate of the first function call, allocate video memory for the first function call from the unified video memory address space.

所述统一显存地址空间包括：所述多个图形处理器的显存和所述多个图形处理器的主存。The unified video memory address space includes: video memory of the multiple graphics processors and main memory of the multiple graphics processors.

在获取资源保障型任务针对目标图形处理器的第一函数调用之前，可以先获取统一显存地址空间，具体地可以包括：确定所述多个图形处理器的显存和所述多个图形处理器的主存；根据所述多个图形处理器的显存和所述多个图形处理器的主存，确定所述统一显存地址空间。Before obtaining the first function call for the target graphics processor of the resource-guaranteed task, the unified video memory address space may be obtained first, which may specifically include: determining the video memory of the multiple graphics processors and the memory address of the multiple graphics processors. Main memory; determining the unified video memory address space according to the video memories of the multiple graphics processors and the main memories of the multiple graphics processors.

相关技术中，GPU提供一个叫做统一显存的特性，这种特性将一个GPU显存和主存统一在了同一个地址空间。统一显存特性通常被用于简化GPU显存管理。TGS创新性地使用了CUDA统一显存：TGS将同一GPU集群中的所有GPU显存分配重定向为统一显存地址空间分配，以实现保证应用对系统的无感知和性能隔离的GPU显存资源共享。In related technology, GPU provides a feature called unified video memory, which unifies a GPU video memory and main memory in the same address space. The unified memory feature is often used to simplify GPU memory management. TGS innovatively uses CUDA unified video memory: TGS redirects all GPU video memory allocations in the same GPU cluster to unified video memory address space allocation to achieve GPU video memory resource sharing that ensures that applications are unaware of the system and perform performance isolation.

具体来说，TGS将统一显存地址空间伪装成普通GPU显存暴露给容器。当一个容器调用GPU显存分配API(Application Programming Interface，应用程序编程接口)，无论这个分配请求是针对GPU普通显存还是统一显存地址空间的，TGS都会劫持这个函数调用，并且在统一显存地址空间内分配相应显存。当资源保障型任务没有使用完所有显存，机会型任务可以使用剩余GPU显存。Specifically, TGS disguises the unified video memory address space as ordinary GPU video memory and exposes it to the container. When a container calls the GPU memory allocation API (Application Programming Interface, Application Programming Interface), whether the allocation request is for the GPU ordinary memory or the unified memory address space, TGS will hijack the function call and allocate it in the unified memory address space. Corresponding video memory. When resource-guaranteed tasks do not use all the video memory, opportunistic tasks can use the remaining GPU video memory.

伪GPU显存是指，在容器和应用看来是普通GPU显存，然而它实际可能是GPU显存也可能是主存，具体位置主要根据GPU显存的使用情况来调整。值得注意的是，本公开并没有改变虚拟显存系统。伪GPU显存仍然是虚拟显存，容器中的任务使用虚拟地址来访问伪GPU显存。一个GPU虚拟地址被GPU的MMU(Memory Management Unit，内存管理单元)翻译成GPU物理地址；一个主机虚拟地址被主机的MMU翻译成主机物理地址。Pseudo GPU memory refers to what appears to containers and applications as ordinary GPU memory. However, it may actually be GPU memory or main memory. The specific location is mainly adjusted according to the usage of GPU memory. It is worth noting that this disclosure does not change the virtual video memory system. Pseudo GPU memory is still virtual memory, and tasks in the container use virtual addresses to access pseudo GPU memory. A GPU virtual address is translated into a GPU physical address by the GPU's MMU (Memory Management Unit); a host virtual address is translated into a host physical address by the host's MMU.

TGS采用的无感知统一显存机制与相关技术中的统一显存机制，在性能隔离和GPU显存无感知超售两个方面存在区别。为了提高性能隔离，TGS使用放置偏好来设置统一显存地址空间，使得资源保障型任务的显存优先级更高。当GPU显存没有被完全使用时，所有任务的所有显存请求都被分配到GPU显存上。当GPU显存已满时，TGS会尝试将资源保障型任务的数据和状态放在GPU上，并在必要的时候驱逐机会型任务的数据和状态至主存上。这一过程对容器内应用是完全无感知的，因为容器仍然使用同样的虚拟地址去访问他们被分配的显存。虚拟地址只是被翻译到了不同位置的物理地址。The non-aware unified memory mechanism adopted by TGS is different from the unified video memory mechanism in related technologies in terms of performance isolation and non-aware overbooking of GPU memory. In order to improve performance isolation, TGS uses placement preferences to set up a unified memory address space so that resource-guaranteed tasks have higher memory priority. When the GPU memory is not fully used, all memory requests for all tasks are allocated to the GPU memory. When the GPU memory is full, TGS will try to put the data and status of resource-guaranteed tasks on the GPU, and evict the data and status of opportunistic tasks to the main memory when necessary. This process is completely unaware of the application within the container, because the container still uses the same virtual address to access their allocated video memory. A virtual address is just a physical address translated to a different location.

因为第一函数调用的到达速率和离开速率相同，并且第一函数调用离开速率控制器表征该第一函数调用将调用图形处理器，以及根据资源保障型任务发出的针对图形处理器的函数调用的速率，即可估算资源保障型任务所需的地址空间大小，因此，可以根据第一函数调用的到达速率，确定该第一函数调用所需的第一显存大小。Because the arrival rate and departure rate of the first function call are the same, and the first function call departure rate controller represents that the first function call will call the graphics processor, and the function call for the graphics processor issued according to the resource guaranteed task The rate can be used to estimate the size of the address space required for resource guarantee tasks. Therefore, the size of the first video memory required for the first function call can be determined based on the arrival rate of the first function call.

为了保证资源保障型任务的性能，会优先为资源保障型任务分配显存。在有空余显存的情况下，机会型任务才会使用显存。因此，在统一显存地址空间中空闲显存的大小，不小于第一显存大小的情况下，可以直接从空闲显存中，为第一函数调用分配显存。In order to ensure the performance of resource-guaranteed tasks, video memory will be allocated to resource-guaranteed tasks first. Opportunistic tasks will only use video memory when there is free video memory. Therefore, when the size of the free video memory in the unified video memory address space is not less than the size of the first video memory, the video memory can be allocated directly from the free video memory for the first function call.

在统一显存地址空间中空闲显存的大小，小于第一显存大小的情况下，可以将占用显存的机会型任务迁移到主存中，从而得到大小不小于第一显存大小的空闲显存，进而从空闲显存中，为第一函数调用分配显存。When the size of the free video memory in the unified video memory address space is smaller than the size of the first video memory, the opportunistic tasks occupying the video memory can be migrated to the main memory, thereby obtaining a free video memory that is not smaller than the size of the first video memory, and then from the free video memory to the main memory. In video memory, allocate video memory for the first function call.

步骤S13：获取机会型任务针对任一所述图形处理器的第二函数调用，并获取所述第二函数调用的到达速率。Step S13: Obtain the second function call of the opportunistic task for any of the graphics processors, and obtain the arrival rate of the second function call.

第二函数调用的到达速率是指：该第二函数调用到达速率监控器的速率。第二函数调用的到达速率，与该第二函数调用对应的发出速率相同。但为了保证机会型任务不影响资源保障型任务的性能，因此，需要确保第二函数调用的离开速率不影响第一函数调用的离开速率。The arrival rate of the second function call refers to the rate at which the second function call arrives at the rate monitor. The arrival rate of the second function call is the same as the corresponding issuance rate of the second function call. However, in order to ensure that opportunistic tasks do not affect the performance of resource-guaranteed tasks, it is necessary to ensure that the departure rate of the second function call does not affect the departure rate of the first function call.

步骤S14：获取预先确定的目标速率。Step S14: Obtain a predetermined target rate.

所述目标速率为：所述资源保障型任务独占所述目标图形处理器显存时，针对所述目标图形处理器的函数调用的到达速率。The target rate is: the arrival rate of function calls to the target graphics processor when the resource guarantee task exclusively occupies the video memory of the target graphics processor.

可以在获取资源保障型任务针对目标图形处理器的第一函数调用之前，先确定目标速率。目标速率为资源保障型任务独占目标图形处理器时，针对目标图形处理器的函数调用的到达速率。TGS可以在机会型任务与资源保障型任务共享图形处理器资源前测量目标速率。The target rate can be determined before obtaining the first function call for the target graphics processor for a resource-guaranteed task. The target rate is the arrival rate of function calls to the target GPU when a resource-guaranteed task exclusively occupies the target GPU. TGS can measure the target rate before opportunistic tasks share GPU resources with resource-guaranteed tasks.

步骤S15：根据所述第一函数调用的到达速率和所述目标速率，确定所述第二函数调用的离开速率。Step S15: Determine the departure rate of the second function call based on the arrival rate of the first function call and the target rate.

所述第二函数调用的离开速率为：所述第二函数调用离开速率控制器的速率。The exit rate of the second function call is: the rate at which the second function call leaves the rate controller.

根据第一函数调用的到达速率和目标速率，可以确定第一函数调用的到达速率是否受到影响。在第一函数调用的到达速率大于或等于目标速率的情况下，可以确定第一函数调用的到达速率未受到影响；在第一函数调用的到达速率小于目标速率的情况下，可以确定第一函数调用的到达速率受到了影响。为了确保第二函数调用的离开速率不影响第一函数调用的离开速率，可以根据第一函数调用的到达速率和目标速率，确定第二函数调用的离开速率。Based on the arrival rate of the first function call and the target rate, it may be determined whether the arrival rate of the first function call is affected. When the arrival rate of the first function call is greater than or equal to the target rate, it can be determined that the arrival rate of the first function call is not affected; when the arrival rate of the first function call is less than the target rate, it can be determined that the arrival rate of the first function call is not affected. The arrival rate of calls is affected. In order to ensure that the departure rate of the second function call does not affect the departure rate of the first function call, the departure rate of the second function call may be determined based on the arrival rate and the target rate of the first function call.

在不影响第一函数调用的离开速率的情况下，为了尽可能地提高GPU的利用率，可以确定第二函数调用的离开速率的最大值，并将第二函数调用的离开速率的最大值，确定为第二函数调用的离开速率。第二函数调用的离开速率的最大值的确定方法将在后文详述。In order to improve the GPU utilization as much as possible without affecting the departure rate of the first function call, the maximum value of the departure rate of the second function call can be determined, and the maximum value of the departure rate of the second function call can be determined as Determines the departure rate for the second function call. The method of determining the maximum value of the departure rate of the second function call will be described in detail later.

步骤S16：根据所述第二函数调用的离开速率，从所述统一显存地址空间中，为所述第二函数调用分配显存或主存。Step S16: According to the departure rate of the second function call, allocate video memory or main memory to the second function call from the unified video memory address space.

根据机会型任务发出的针对图形处理器的函数调用的速率，即可估算机会型任务所需的地址空间大小，因此，可以根据第二函数调用的离开速率，确定该第二函数调用所需的第二显存大小。The address space size required by the opportunistic task can be estimated based on the rate of function calls directed to the graphics processor issued by the opportunistic task. Therefore, the address space required by the second function call can be determined based on the departure rate of the second function call. Secondary memory size.

在统一显存地址空间中空闲显存的大小，不小于第二显存大小的情况下，可以直接从统一显存地址空间中，为第二函数调用分配显存。在统一显存地址空间中空闲显存的大小，小于第二显存大小的情况下，则可以从统一显存地址空间中，为第二函数调用分配主存。When the size of the free video memory in the unified video memory address space is not less than the size of the second video memory, the video memory can be allocated directly from the unified video memory address space for the second function call. When the size of free video memory in the unified video memory address space is smaller than the size of the second video memory, main memory can be allocated for the second function call from the unified video memory address space.

采用本公开实施例的技术方案，可以在服务器无感知、不修改任务框架、保证性能隔离、无需将多个CUDA上下文合并为一个的情况下，提升GPU的利用率。By adopting the technical solutions of the embodiments of the present disclosure, GPU utilization can be improved without the server being aware of it, without modifying the task framework, ensuring performance isolation, and without merging multiple CUDA contexts into one.

下面将介绍如何确定第二函数调用的离开速率的最大值。图3是本公开实施例中自适应速率控制的流程示意图。速率控制算法可以控制机会型任务的第二函数调用的离开速率，使得在资源保障型任务的第一函数调用的到达速率不受影响的同时，最大化机会型任务的第二函数调用的离开速率。形式化来说，令a_in为第一函数调用的到达速率，令a_out为第一函数调用的离开速率，令β_in为第二函数调用的到达速率，令β_out为第二函数调用的离开速率，令R为目标速率。因为只监视而并不限制资源保障型任务的速率，因此a_in＝a_out。Here's how to determine the maximum departure rate for the second function call. Figure 3 is a schematic flowchart of adaptive rate control in an embodiment of the present disclosure. The rate control algorithm can control the departure rate of the second function call of the opportunistic task, so that the arrival rate of the first function call of the resource guarantee task is not affected while maximizing the departure rate of the second function call of the opportunistic task. . Formally speaking, let a _in be the arrival rate of the first function call, let a _out be the departure rate of the first function call, let β _in be the arrival rate of the second function call, let β _out be the arrival rate of the second function call Departure velocity, let R be the target velocity. Because it only monitors but does not limit the rate of resource guarantee tasks, a _in = a _out .

速率控制算法的目标就是要保证在a_in>R的约束下，最大化β_out。在这其中，β_out是可控变量，而a_in的变化受β_out影响。令f为捕捉了a_in和β_out之间关系的函数，即a_in＝f(β_out)。这样，速率控制算法实质上为解下面这个优化问题：The goal of the rate control algorithm is to maximize β _out under the constraint of a _in >R. Among them, β _out is a controllable variable, and the change of a _in is affected by β _out . Let f be a function that captures the relationship between a _in and β _out , that is, a _in =f(β _out ). In this way, the rate control algorithm essentially solves the following optimization problem:

maxβ_out maxβ _out

s.t.α_in＝f(β_out)≥Rstα _in =f(β _out )≥R

β_out≥0β _out ≥0

虽然f(β_out)的确切形式无法知晓，但由于这个问题本身的性质，可以知晓其大致形状。图4示出了第一函数调用的到达速率和第二函数调用的离开速率之间的关系。如图4(a)所示，通常情况下当β_out很小的时候，a_in是平的并且等于R；当β_out较大的时候，a_in会单调递减。这一现象背后的直觉是：当β_out很小的时候，GPU并没有被完全利用，执行机会型任务的第二函数调用并不会影响资源保障型任务的性能，因此函数是一条平直的线；但过了一个转折点后，GPU就被充分利用了，此时再增加机会型任务的GPU资源就会影响资源保障型任务的性能，所以函数呈现单调下降的趋势。值得注意的是，下降趋势并不一定是线性的，图4(a)仅仅只是描绘了一个大致趋势。自适应速率调节算法的目标则是找到函数呈现下降趋势的转折点。Although the exact form of f(β _out ) is not known, its approximate shape is known due to the nature of the problem itself. Figure 4 shows the relationship between the arrival rate of a first function call and the departure rate of a second function call. As shown in Figure 4(a), usually when β _out is small, a _in is flat and equal to R; when β _out is large, a _in decreases monotonically. The intuition behind this phenomenon is: when β _out is small, the GPU is not fully utilized, and the second function call to execute the opportunistic task will not affect the performance of the resource guarantee task, so the function is a straight line line; but after a turning point, the GPU is fully utilized. At this time, adding GPU resources for opportunistic tasks will affect the performance of resource-guaranteed tasks, so the function shows a monotonous downward trend. It is worth noting that the downward trend is not necessarily linear, and Figure 4(a) only depicts a general trend. The goal of the adaptive rate adjustment algorithm is to find the turning point where the function shows a downward trend.

图4(a)仅表现了一个通常情况。其中还有两个特殊情况。图4(b)表示的是GPU完全被资源保障型任务用满了的特殊情况，此时即使只释放少量机会型任务的第二函数调用也会影响性能。在这种情况下，曲线没有平直部分。图4(c)表现的是机会型任务对GPU资源占用较少，完全不会影响资源保障型任务性能的特殊情况。这种情况下，曲线没有单调下降的部分。Figure 4(a) only shows a common situation. There are two special cases. Figure 4(b) shows the special case where the GPU is completely occupied by resource-guaranteed tasks. At this time, even the second function call that releases only a small number of opportunistic tasks will affect performance. In this case, the curve has no straight parts. Figure 4(c) shows the special situation where opportunistic tasks occupy less GPU resources and will not affect the performance of resource-guaranteed tasks at all. In this case, the curve does not have a monotonically decreasing portion.

为了找到最优的β_out，本公开实施例使用了AIMD(Additive Increase andMultiplicative Decrease，加性增乘性减)算法来控制β_out。当机会型任务与资源保障型任务开始共享后，如果α_in>R，TGS线性增加β_out，否则乘性减少β_out。AIMD算法保证了最终β_out会快速收敛到转折点。In order to find the optimal β _out , the embodiment of the present disclosure uses an AIMD (Additive Increase and Multiplicative Decrease) algorithm to control β _out . When opportunistic tasks and resource guarantee tasks begin to share, if α _in >R, TGS linearly increases β _out , otherwise it multiplicatively decreases β _out . The AIMD algorithm ensures that the final β _out will quickly converge to the turning point.

此外，当资源保障型任务改变它的GPU利用情况时，TGS可以检测到第一函数调用的到达速率的变化大于一个阈值。在这种情况下，控制模块就会乘性降低β_out。当资源保障型任务速率稳定后，控制模块再重新使用AIMD来调整β_out收敛到新的转折点。Furthermore, when a resource-guaranteed task changes its GPU utilization, TGS can detect that the arrival rate of the first function call changes by more than a threshold. In this case, the control module will reduce β _out multiplicatively. When the resource guarantee task rate stabilizes, the control module reuses AIMD to adjust β _out to converge to a new turning point.

因此，在所述第一函数调用的到达速率大于或等于所述目标速率的情况下，线性增大所述第二函数调用的离开速率，得到所述第二函数调用的离开速率的最大值；在所述第一函数调用的到达速率小于所述目标速率的情况下，乘性减小所述第二函数调用的当前离开速率，直到所述第一函数调用的到达速率不小于所述目标速率，得到所述第二函数调用的离开速率的最大值。Therefore, when the arrival rate of the first function call is greater than or equal to the target rate, the departure rate of the second function call is linearly increased to obtain the maximum value of the departure rate of the second function call; If the arrival rate of the first function call is less than the target rate, multiply the current departure rate of the second function call until the arrival rate of the first function call is not less than the target rate. , obtain the maximum value of the departure rate of the second function call.

TGS使用了自适应速率控制的方法，其核心想法是根据资源保障型任务的第一函数调用的到达速率，小心地控制机会型队列内第二函数调用的离开速率，从而使机会型任务恰好使用完剩余的GPU资源。TGS将控制与GPU底层细节解耦，并且不需要对GPU内部硬件进行控制。TGS uses an adaptive rate control method. The core idea is to carefully control the departure rate of the second function call in the opportunistic queue according to the arrival rate of the first function call of the resource-guaranteed task, so that the opportunistic task can use exactly Use up remaining GPU resources. TGS decouples control from the low-level details of the GPU and does not require control of the GPU's internal hardware.

这个方法需要一个反馈信号来告知控制循环是否释放机会型队列中的第二函数调用来增大GPU资源使用量，或者减小第二函数调用的离开速率避免影响资源保障型任务。理想情况下，可以使用用户性能(即使用深度学习训练任务的训练吞吐)作为反馈信号，因为这是用户真正关心的指标。然而，因为获得训练吞吐需要使用用户层知识，这与设计一个服务器无感知的系统层解决方案相违背。This method requires a feedback signal to tell the control loop whether to release the second function call in the opportunistic queue to increase GPU resource usage, or to reduce the departure rate of the second function call to avoid affecting resource-guaranteed tasks. Ideally, user performance (i.e., training throughput of a deep learning training task) can be used as a feedback signal, since this is the metric that users really care about. However, since obtaining training throughput requires the use of user-level knowledge, this goes against designing a server-agnostic system-level solution.

一个可能的设计选择是使用GPU利用率，即如果GPU利用率小于100％就增大第二函数调用的离开速率。然而，GPU利用率的定义是跟硬件相关的，并且经常十分模糊。如今的GPU和深度学习加速器本身就是异构设备，其中包括了多种计算单元，例如英伟达GPU上有Tensor Core和CUDA Core。GPU驱动报告的GPU利用率很难准确。即使它真的准确，也很难在有多种计算单元的GPU上有真正意义。因此，即使当GPU利用率低于100％，也不意味着可以在不影响资源保障型任务的情况下增加机会型任务的GPU资源。例如，一个资源保障型任务和一个机会型任务可能为了同一种已经被耗尽的资源竞争，那即使其他资源处于空闲状态，GPU利用率不为100％，也可能导致资源保障型任务被干扰。One possible design choice is to use GPU utilization, i.e. increase the exit rate of the second function call if the GPU utilization is less than 100%. However, the definition of GPU utilization is hardware-specific and often vague. Today's GPUs and deep learning accelerators themselves are heterogeneous devices, including a variety of computing units, such as Tensor Core and CUDA Core on NVIDIA GPUs. GPU utilization reported by the GPU driver is difficult to be accurate. Even if it were indeed accurate, it would be difficult to make real sense on a GPU with multiple compute units. Therefore, even when GPU utilization is below 100%, it does not mean that GPU resources for opportunistic tasks can be increased without affecting resource-guaranteed tasks. For example, a resource-guaranteed task and an opportunistic task may compete for the same resource that has been exhausted. Even if other resources are idle and the GPU utilization is not 100%, the resource-guaranteed task may be interfered with.

TGS采用了第一函数调用的到达速率作为反馈速率。一个深度学习训练任务会根据深度神经网络模型构建出一个计算图。训练任务会根据计算图生成和发送函数调用。计算图捕捉了函数调用之间的依赖关系。函数调用的到达速率直接反映了训练吞吐。如果一个训练任务速度减慢，那么函数调用的到达速率就会变慢。因此，TGS使用了速率监控器来检测资源保障型任务的第一函数调用的到达速率，并且用它作为反馈信号来控制机会型任务的第二函数调用的离开速率。值得注意的是，任何资源保障型任务和机会型任务的竞争都可以用这一指标进行测量，包括GPU缓存的竞争，CPU的竞争和网络竞争。其中一些超过了GPU硬件设计所能控制的范畴，而TGS则可以都控制它们。因为函数调用(包括第一函数调用和第二函数调用)的到达速率有小幅波动，所以TGS使用了滑动平均数来平滑对函数调用的到达速率的估计。对于资源保障型任务的第一函数调用的到达速率，TGS仅进行了简单的计数操作来进行估计。TGS并不缓存第一函数调用，而是直接将它们传给GPU，以最小化对资源保障型任务的性能影响。TGS uses the arrival rate of the first function call as the feedback rate. A deep learning training task constructs a computational graph based on a deep neural network model. The training task generates and sends function calls based on the computational graph. Computational graphs capture dependencies between function calls. The arrival rate of function calls directly reflects training throughput. If a training task slows down, the rate at which function calls arrive will slow down. Therefore, TGS uses a rate monitor to detect the arrival rate of the first function call of the resource-guaranteed task, and uses it as a feedback signal to control the departure rate of the second function call of the opportunistic task. It is worth noting that the competition between any resource-guaranteed tasks and opportunistic tasks can be measured with this metric, including GPU cache competition, CPU competition and network competition. Some of them are beyond the control of the GPU hardware design, and TGS can control them all. Because the arrival rate of function calls (including the first function call and the second function call) fluctuates slightly, TGS uses a moving average to smooth the estimate of the arrival rate of function calls. For the arrival rate of the first function call of resource-guaranteed tasks, TGS only performs a simple counting operation to estimate. TGS does not cache first function calls, but passes them directly to the GPU to minimize the performance impact on resource-guaranteed tasks.

TGS利用自适应速率控制机制和无感知统一显存机制来解决系统层无感知共享GPU的两个主要挑战。第一个挑战是如何在缺乏应用知识的情况下，自适应地在不同容器之间共享GPU计算资源。当前最新解决方案要么如AntMan一样，限制用户使用修改后地深度学习框架；要么如MPS一样，需要利用应用知识显式设置GPU计算资源限制。为了解决这个挑战，TGS的速率监控器会监视每个容器的性能，并给控制循环提供实时信号。基于这个信号，TGS的速率控制器就能自适应地控制每个容器发给GPU的函数调用。整个控制循环会最终自动收敛到一个稳定状态，即机会型任务尽可能多地使用空闲GPU资源，而又不影响资源保障型任务的性能隔离。TGS uses an adaptive rate control mechanism and a non-aware unified memory mechanism to solve the two main challenges of system-level non-aware shared GPUs. The first challenge is how to adaptively share GPU computing resources among different containers without application knowledge. The latest current solutions either restrict users from using modified deep learning frameworks, like AntMan; or, like MPS, require explicit use of application knowledge to set GPU computing resource limits. To address this challenge, TGS's rate monitor monitors the performance of each container and provides real-time signals to the control loop. Based on this signal, TGS's rate controller can adaptively control the function calls sent to the GPU by each container. The entire control loop will eventually automatically converge to a stable state, where opportunistic tasks use as much idle GPU resources as possible without affecting the performance isolation of resource-guaranteed tasks.

第二个挑战是提供无感知GPU显存超售。AntMan修改了深度学习框架来在GPU显存超售时将一部分GPU显存内数据换出至主存。系统层解决方案MPS则不支持GPU显存超售，需要应用自身处理显存和主存之间的换入换出。这些方法都不是无感知的。为了解决这一挑战，TGS利用了统一显存模块，将多个GPU的显存和主存统一在了一个地址空间下。TGS劫持并重定向每个容器的GPU显存分配至统一显存地址空间。当GPU显存被超售时，TGS可以自动驱逐部分机会型任务存放在显存上的数据至主存中，并且将相应修改虚拟地址映射至主存上的新物理地址。整个过程应用都是无感知的，且不影响应用运行。为了保证性能隔离，TGS使用放置偏好来优先将资源保障型任务的数据分配至GPU显存上，空余的GPU显存则可以分配给机会型任务。The second challenge is to provide GPU memory oversubscription without awareness. AntMan modified the deep learning framework to swap out part of the data in the GPU memory to the main memory when the GPU memory is oversold. The system layer solution MPS does not support GPU memory overbooking and requires the application itself to handle the swapping in and out between graphics memory and main memory. None of these methods are mindless. In order to solve this challenge, TGS uses a unified video memory module to unify the video memory and main memory of multiple GPUs under one address space. TGS hijacks and redirects the GPU memory of each container to a unified memory address space. When the GPU video memory is oversubscribed, TGS can automatically evict the data stored in the video memory by some opportunistic tasks to the main memory, and map the corresponding modified virtual address to the new physical address on the main memory. The entire application process is imperceptible and does not affect application operation. In order to ensure performance isolation, TGS uses placement preferences to prioritize data for resource-guaranteed tasks to GPU memory, and spare GPU memory can be allocated to opportunistic tasks.

TGS的整体设计还体现了两个其他优势。首先，TGS整体架构是轻量级的，它没有采用重量级的二进制翻译来实现GPU共享，这保证了它的低性能开销，也和容器技术的原则一致。其次，TGS提供了和普通容器一样的错误隔离能力。使用TGS的不同容器使用不同的CUDA上下文，而不是像MPS一样将多个CUDA上下文合并为一个。因此，一个容器内应用的错误不会影响到其他容器。The overall design of the TGS also reflects two other advantages. First of all, the overall architecture of TGS is lightweight. It does not use heavyweight binary translation to achieve GPU sharing, which ensures its low performance overhead and is consistent with the principles of container technology. Secondly, TGS provides the same error isolation capabilities as ordinary containers. Different containers using TGS use different CUDA contexts instead of merging multiple CUDA contexts into one like MPS. Therefore, errors applied within one container will not affect other containers.

TGS的无感知统一显存机制也在不需要修改深度学习框架的情况下，无感知地解决了现有深度学习框架过度请求GPU显存的问题。具体地，所述从统一显存地址空间中，为所述第一函数调用分配显存，可以包括：从所述统一显存地址空间中，为所述第一函数调用分配虚拟第一物理地址；在所述资源保障型任务访问所述第一虚拟物理地址出错时，从所述统一显存地址空间中，为所述第一函数调用分配显存的第一真实物理地址。所述从所述统一显存地址空间中，为所述第二函数调用分配显存或主存，可以包括：从所述统一显存地址空间中，为所述第二函数调用分配虚拟第二物理地址；在所述机会型任务访问所述第二虚拟物理地址出错时，从所述统一显存地址空间中，为所述第二函数调用分配显存的第二真实物理地址或主存的第二真实物理地址。The non-aware unified memory mechanism of TGS also solves the problem of excessive requests for GPU memory by existing deep learning frameworks without modifying the deep learning framework. Specifically, allocating video memory for the first function call from the unified video memory address space may include: allocating a virtual first physical address for the first function call from the unified video memory address space; When an error occurs when the resource guarantee task accesses the first virtual physical address, the first real physical address of the video memory is allocated to the first function call from the unified video memory address space. Allocating video memory or main memory for the second function call from the unified video memory address space may include: allocating a virtual second physical address for the second function call from the unified video memory address space; When an error occurs when the opportunistic task accesses the second virtual physical address, a second real physical address of the video memory or a second real physical address of the main memory is allocated to the second function call from the unified video memory address space. .

当深度学习框架申请所有可用的GPU显存时，TGS在统一显存地址空间中分配相应的地址范围。但实际的显存分配是发生在应用第一次访问该地址触发页表错误之后，才实际会分配真正的物理地址。最终结果是，只有真正活跃的数据是在GPU显存中分配了物理地址的。这样机会型任务也就能够获得更多的GPU空间来实现高效GPU共享。When the deep learning framework applies for all available GPU memory, TGS allocates the corresponding address range in the unified memory address space. However, the actual allocation of video memory occurs after the application triggers a page table error by accessing the address for the first time, and then the real physical address is actually allocated. The end result is that only truly active data is assigned a physical address in GPU memory. In this way, opportunistic tasks can obtain more GPU space to achieve efficient GPU sharing.

本公开实施例提出的TGS，是一个支持容器云环境下深度学习训练任务无感知共享GPU的系统。TGS置于系统层，因此用户对其无感知。它实现了应用层解决方案所能提供的便利，它又允许用户使用任何版本的任何深度学习框架在容器内开发和运行任务。The TGS proposed in the embodiment of this disclosure is a system that supports non-aware sharing of GPUs for deep learning training tasks in a container cloud environment. TGS is placed at the system layer, so users are unaware of it. It realizes the convenience that application layer solutions can provide, and it allows users to develop and run tasks within containers using any version of any deep learning framework.

在保证性能隔离的系统层GPU共享方案的过程中，主要存在两个技术挑战。第一个技术挑战是如何在没有应用知识的情况下自适应调节不同容器共享GPU。为了解决这个问题，TGS会在运行时监视资源保障型任务的性能，并且自适应的调节机会型任务的资源配额。这一控制逻辑最终会收敛到一个最有的资源分配情况，即机会型任务尽可能多的使用空闲的GPU资源，而又不影响资源保障型任务。There are two main technical challenges in the process of developing a system-level GPU sharing solution that ensures performance isolation. The first technical challenge is how to adaptively adjust the GPU shared by different containers without application knowledge. In order to solve this problem, TGS will monitor the performance of resource guarantee tasks at runtime and adaptively adjust the resource quotas of opportunistic tasks. This control logic will eventually converge to an optimal resource allocation situation, that is, opportunistic tasks use as much idle GPU resources as possible without affecting resource guarantee tasks.

第二个挑战是实现无感知的GPU显存超售。GPU有自己独立的显存来保存应用状态。本公开在系统层实现了一个独特的无感知统一内存管理机制，在系统层提供了一个统一内存视角，并且无需修改应用以及框架。当GPU显存被超售时，这一机制能够无感知地管理GPU显存中的数据换入换出。TGS利用放置偏好保证了资源保障型任务的性能。The second challenge is to achieve imperceptible GPU memory oversubscription. The GPU has its own independent video memory to save application state. This disclosure implements a unique non-aware unified memory management mechanism at the system layer, provides a unified memory perspective at the system layer, and does not require modification of applications and frameworks. When the GPU memory is oversubscribed, this mechanism can manage the swapping in and out of data in the GPU memory without being aware of it. TGS uses placement preferences to ensure the performance of resource-guaranteed tasks.

实验表明：(1)TGS能够保证资源保障型任务的吞吐。(2)TGS保证了机会型任务的吞吐与最新应用层GPU共享方案AntMan接近一致，并且能够比现有系统层解决方案MPS提高吞吐至多15倍。Experiments show that: (1) TGS can ensure the throughput of resource guarantee tasks. (2) TGS ensures that the throughput of opportunistic tasks is close to the latest application layer GPU sharing solution AntMan, and can increase throughput up to 15 times compared to the existing system layer solution MPS.

需要说明的是，对于方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本公开实施例并不受所描述的动作顺序的限制，因为依据本公开实施例，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作并不一定是本公开实施例所必须的。It should be noted that for the sake of simple description, the method embodiments are expressed as a series of action combinations. However, those skilled in the art should know that the disclosed embodiments are not limited by the described action sequence because Certain steps may be performed in other orders or simultaneously according to embodiments of the present disclosure. Secondly, those skilled in the art should also know that the embodiments described in the specification are preferred embodiments, and the actions involved are not necessarily necessary for the embodiments of the present disclosure.

图5是本公开实施例中一种服务器无感知资源处理装置的结构示意图，如图5所示，所述装置包括第一获取模块、第一分配模块、第二获取模块、目标获取模块、速率确定模块和第二分配模块，其中：Figure 5 is a schematic structural diagram of a server-less resource processing device in an embodiment of the present disclosure. As shown in Figure 5, the device includes a first acquisition module, a first allocation module, a second acquisition module, a target acquisition module, a rate Determination module and second allocation module, where:

可选地，所述速率确定模块具体用于执行包括：Optionally, the rate determination module is specifically configured to perform:

可选地，所述在保证所述第一函数调用的到达速率大于或等于所述目标速率的情况下，确定是第二函数调用的离开速率的最大值，包括：Optionally, when ensuring that the arrival rate of the first function call is greater than or equal to the target rate, determining the maximum value of the departure rate of the second function call includes:

可选地，所述第一分配模块具体用于执行：Optionally, the first allocation module is specifically used to execute:

可选地，所述第二分配模块具体用于执行：Optionally, the second allocation module is specifically used to execute:

确定模块，用于确定所述多个图形处理器的显存和所述多个图形处理器的主存；A determining module, configured to determine the display memory of the multiple graphics processors and the main memory of the multiple graphics processors;

地址空间模块，用于根据所述多个图形处理器的显存和所述多个图形处理器的主存，确定所述统一显存地址空间。An address space module, configured to determine the unified video memory address space based on the video memories of the multiple graphics processors and the main memories of the multiple graphics processors.

目标确定模块，用于将所述资源保障型任务独占所述目标图形处理器时，针对所述目标图形处理器的函数调用的到达速率，确定为所述目标速率。A target determination module, configured to determine the arrival rate of function calls to the target graphics processor as the target rate when the resource guarantee task exclusively occupies the target graphics processor.

可选地，所述第一分配模块具体用于执行：从所述统一显存地址空间中，为所述第一函数调用分配虚拟第一物理地址；在所述资源保障型任务访问所述第一虚拟物理地址出错时，从所述统一显存地址空间中，为所述第一函数调用分配显存的第一真实物理地址；Optionally, the first allocation module is specifically configured to: allocate a virtual first physical address for the first function call from the unified video memory address space; and access the first virtual address when the resource guarantee task accesses the first function call. When the virtual physical address is incorrect, allocate the first real physical address of the video memory for the first function call from the unified video memory address space;

所述第二分配模块具体用于执行：从所述统一显存地址空间中，为所述第二函数调用分配虚拟第二物理地址；在所述机会型任务访问所述第二虚拟物理地址出错时，从所述统一显存地址空间中，为所述第二函数调用分配显存的第二真实物理地址或主存的第二真实物理地址。The second allocation module is specifically configured to: allocate a virtual second physical address for the second function call from the unified display memory address space; when the opportunistic task makes an error in accessing the second virtual physical address , from the unified display memory address space, allocate a second real physical address of the display memory or a second real physical address of the main memory for the second function call.

需要说明的是，装置实施例与方法实施例相近，故描述的较为简单，相关之处参见方法实施例即可。It should be noted that the device embodiment is similar to the method embodiment, so the description is relatively simple. For relevant information, please refer to the method embodiment.

本说明书中的各个实施例均采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似的部分互相参见即可。Each embodiment in this specification is described in a progressive manner. Each embodiment focuses on its differences from other embodiments. The same and similar parts between the various embodiments can be referred to each other.

本领域内的技术人员应明白，本公开实施例的实施例可提供为方法、装置或计算机程序产品。因此，本公开实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本公开实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It should be understood by those skilled in the art that embodiments of the present disclosure may be provided as methods, apparatuses, or computer program products. Accordingly, embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, embodiments of the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本公开实施例是参照根据本公开实施例的方法、装置、电子设备和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。Embodiments of the disclosure are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices and computer program products according to embodiments of the disclosure. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine such that the instructions are executed by the processor of the computer or other programmable data processing terminal device. Means are generated for implementing the functions specified in the process or processes of the flowchart diagrams and/or the block or blocks of the block diagrams.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing terminal equipment to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the The instruction means implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上，使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing terminal equipment, so that a series of operating steps are performed on the computer or other programmable terminal equipment to produce computer-implemented processing, thereby causing the computer or other programmable terminal equipment to perform a computer-implemented process. The instructions executed on provide steps for implementing the functions specified in a process or processes of the flow diagrams and/or a block or blocks of the block diagrams.

尽管已描述了本公开实施例的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例做出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本公开实施例范围的所有变更和修改。Although preferred embodiments of the disclosed embodiments have been described, those skilled in the art will be able to make additional changes and modifications to these embodiments once the basic inventive concepts are apparent. Therefore, it is intended that the appended claims be construed to include the preferred embodiments and all changes and modifications that fall within the scope of the disclosed embodiments.

最后，还需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括要素的过程、方法、物品或者终端设备中还存在另外的相同要素。Finally, it should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or any such actual relationship or sequence between operations. Furthermore, the terms "comprises," "comprises," or any other variation thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or end device that includes a list of elements includes not only those elements, but also elements not expressly listed or other elements inherent to such process, method, article or terminal equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that includes the element.

以上对本公开所提供的一种服务器无感知资源处理方法、装置和电子设备，进行了详细介绍，本文中应用了具体个例对本公开的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本公开的方法及其核心思想；同时，对于本领域的一般技术人员，依据本公开的思想，在具体实施方式及应用范围上均会有改变之处，综上，本说明书内容不应理解为对本公开的限制。The server-insensitive resource processing method, device and electronic equipment provided by the present disclosure have been introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation modes of the present disclosure. The description of the above embodiments is only for To help understand the methods and core ideas of the present disclosure; at the same time, for those of ordinary skill in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present disclosure. In summary, the content of this specification does not It should be understood as a limitation of this disclosure.

Claims

1. A server non-aware resource handling method, comprising:

acquiring a first function call of a resource guarantee type task aiming at a target graphic processor, and acquiring the arrival rate of the first function call, wherein the target graphic processor is: one of a plurality of graphics processors included in a graphics processor cluster, the first function call arriving at a rate of: the first function calls a rate of arrival at a rate monitor;

and distributing video memory for the first function call from a unified video memory address space according to the arrival rate of the first function call, wherein the unified video memory address space comprises: the video memories of the plurality of graphics processors and the main memory of the plurality of graphics processors;

acquiring a second function call of the opportunistic task aiming at any graphic processor, and acquiring the arrival rate of the second function call;

obtaining a predetermined target rate, wherein the target rate is: when the resource guarantee type task solely occupies the display memory of the target graphic processor, aiming at the arrival rate of the function call of the target graphic processor;

determining the leaving rate of the second function call according to the arrival rate of the first function call and the target rate, wherein the leaving rate of the second function call is as follows: the second function invokes a rate of departure from the rate controller;

And distributing video memory or main memory for the second function call from the unified video memory address space according to the leaving rate of the second function call.

2. The method of claim 1, wherein the determining the departure rate of the second function call from the arrival rate of the first function call and the target rate comprises:

determining a maximum value of the departure rate of a second function call under the condition that the arrival rate of the first function call is ensured to be greater than or equal to the target rate;

and determining the maximum value of the leaving rate of the second function call as the leaving rate of the second function call.

3. The method of claim 2, wherein determining a maximum value of a departure rate of a second function call with the arrival rate of the first function call being guaranteed to be greater than or equal to the target rate comprises:

linearly increasing the departure rate of the second function call under the condition that the arrival rate of the first function call is greater than or equal to the target rate, so as to obtain the maximum value of the departure rate of the second function call;

and under the condition that the arrival rate of the first function call is smaller than the target rate, multiplicatively reducing the current departure rate of the second function call until the arrival rate of the first function call is not smaller than the target rate, and obtaining the maximum value of the departure rate of the second function call.

4. The method of claim 1, wherein allocating the memory for the first function call from the unified memory address space according to the arrival rate of the first function call comprises:

determining a first video memory size required by the first function call according to the arrival rate of the first function call;

under the condition that the size of the idle video memory in the unified video memory address space is smaller than the first video memory, transferring the opportunistic task occupying the video memory into a main memory to obtain the idle video memory with the size not smaller than the first video memory;

and distributing the video memory for the first function call.

5. The method of claim 1, wherein allocating memory or main memory for the second function call from the unified memory address space according to the departure rate of the second function call comprises:

determining a second video memory size required by the second function call according to the leaving rate of the second function call;

under the condition that the size of the idle video memory in the unified video memory address space is not smaller than the second video memory size, distributing the video memory for the second function call from the unified video memory address space;

And distributing main memory for the second function call from the unified video memory address space under the condition that the size of the idle video memory in the unified video memory address space is smaller than the second video memory size.

6. The method of claim 1, wherein prior to the first function call for the target graphics processor by the get resource-responsible task, further comprising:

determining the video memories of the plurality of graphics processors and the main memory of the plurality of graphics processors;

and determining the unified video memory address space according to the video memories of the plurality of graphic processors and the main memories of the plurality of graphic processors.

7. The method of claim 1, wherein prior to the first function call for the target graphics processor by the get resource-responsible task, further comprising:

and determining the arrival rate of the function call of the target graphic processor as the target rate when the resource guarantee type task solely occupies the target graphic processor.

8. The method of claim 1, wherein allocating memory for the first function call from a unified memory address space comprises:

Distributing a first virtual address for the first function call from the unified video memory address space;

when the resource guarantee type task accesses the first virtual address and is in error, a first real physical address of a video memory is allocated for the first function call from the unified video memory address space;

and the step of distributing the video memory or the main memory for the second function call from the unified video memory address space comprises the following steps:

distributing a second virtual address for the second function call from the unified video memory address space;

and when the opportunity type task accesses the second virtual address and is in error, distributing a second real physical address of the video memory or a second real physical address of the main memory for the second function call from the unified video memory address space.

9. A server non-aware resource handling device, comprising:

the first acquisition module is used for acquiring a first function call of a resource guarantee type task for a target graphic processor and acquiring the arrival rate of the first function call, and the target graphic processor is: one of a plurality of graphics processors included in a graphics processor cluster, the first function call arriving at a rate of: the first function calls a rate of arrival at a rate monitor;

The first allocation module is configured to allocate a video memory for the first function call from a unified video memory address space according to an arrival rate of the first function call, where the unified video memory address space includes: the video memories of the plurality of graphics processors and the main memory of the plurality of graphics processors;

the second acquisition module is used for acquiring a second function call of the opportunistic task for any graphic processor and acquiring the arrival rate of the second function call;

the target acquisition module is used for acquiring a predetermined target rate, wherein the target rate is as follows: when the resource guarantee type task solely occupies the display memory of the target graphic processor, aiming at the arrival rate of the function call of the target graphic processor;

the rate determining module is configured to determine, according to the arrival rate of the first function call and the target rate, the departure rate of the second function call, where the departure rate of the second function call is: the second function invokes a rate of departure from the rate controller;

and the second allocation module is used for allocating the video memory or the main memory for the second function call from the unified video memory address space according to the leaving rate of the second function call.

10. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the executable instructions to implement the server unaware resource handling method of any of claims 1 to 8.