[go: up one dir, main page]

CN114625482B - Device management method and device - Google Patents

Device management method and device

Info

Publication number
CN114625482B
CN114625482B CN202210294026.6A CN202210294026A CN114625482B CN 114625482 B CN114625482 B CN 114625482B CN 202210294026 A CN202210294026 A CN 202210294026A CN 114625482 B CN114625482 B CN 114625482B
Authority
CN
China
Prior art keywords
container
gpu
vgpu
gpus
api
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210294026.6A
Other languages
Chinese (zh)
Other versions
CN114625482A (en
Inventor
安仲奇
董建波
唐小川
张正俣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210294026.6A priority Critical patent/CN114625482B/en
Publication of CN114625482A publication Critical patent/CN114625482A/en
Application granted granted Critical
Publication of CN114625482B publication Critical patent/CN114625482B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45587Isolation or security of virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Warehouses Or Storage Devices (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请实施例提供了一种设备管理方法和装置。该方法包括:将N个GPU挂载在多个容器中的每个容器上,该N个GPU之间存在预设链路,N为大于1的整数;对每个容器能够调用的GPU进行虚拟化,得到对应于每个容器的一个或多个vGPU实例;将虚拟化得到的vGPU实例提供给对应的容器使用。通过在每个容器上挂载N个GPU,并对其能够调用的GPU进行虚拟化,从而保证容器间GPU隔离的同时,又能避免阻断GPU之间的预设链路,进而允许GPU之间使用预设链路进行通信。

An embodiment of the present application provides a device management method and apparatus. The method includes: mounting N GPUs on each of a plurality of containers, with a preset link existing between the N GPUs, where N is an integer greater than 1; virtualizing the GPUs that can be called by each container to obtain one or more vGPU instances corresponding to each container; and providing the virtualized vGPU instances to the corresponding container for use. By mounting N GPUs on each container and virtualizing the GPUs that can be called by each container, the isolation of GPUs between containers is ensured while avoiding blocking the preset links between GPUs, thereby allowing communication between GPUs using the preset links.

Description

Equipment management method and device
Technical Field
The present application relates to the field of computers, and more particularly, to a device management method and apparatus.
Background
With the continued development of computer technology, more and more artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) deep training tasks are deployed and run in containers, while AI deep training tasks rely largely on graphics processing units (graphics processing unit, GPUs).
Currently, when a GPU is mounted on a container, a system or a user configures the GPU that can be invoked by each container. The software responsible for managing the GPUs, i.e., GPU runtime (runtime), may mount a corresponding GPU for each container according to the configuration of the system or user. For each container, only the GPU mounted in the present container can be used to ensure isolation between containers.
In some scenarios, such as distributed training, the same tenant may use multiple containers to perform the same task, so as to improve the execution efficiency. Because data sharing may be required between multiple containers, high-speed data transmission between multiple containers is required, and the high-speed data transmission may be implemented through a GPU high-speed interconnection technology with a communication bandwidth far higher than that of a common network. But the different containers are isolated from each other, meaning that high-speed interconnections between GPUs of different containers cannot be used. Therefore, at present, data transmission between containers is mainly implemented by means of shared memory or network transmission. But this may affect overall performance. For example, the data needs to be transferred through the system main memory in a shared memory mode, and the data is copied for multiple times, so that the communication efficiency is low, the communication performance is poor, the execution efficiency is limited, and the expandability of training is limited.
Disclosure of Invention
The application provides a device management method and device, which aim to improve communication speed and execution efficiency while realizing container isolation.
In a first aspect, the application provides a device management method, which comprises the steps of mounting a GPU on each container in N containers, wherein a preset link exists between the N GPUs, N is an integer greater than 1, virtualizing the GPUs which can be called by each container to obtain one or more vGPU examples corresponding to each container, and providing the vGPU examples obtained by the virtualization for the corresponding container.
The application provides a device management device, which comprises a control module and a virtualization module, wherein the control module is used for mounting N Graphic Processing Units (GPUs) on each container in a plurality of containers, preset links exist among the N GPUs, N is an integer larger than 1, the virtualization module is used for virtualizing the GPUs which can be called by each container to obtain one or more vGPU examples corresponding to each container, and the control module is also used for providing the vGPU examples obtained by the virtualization for the corresponding container.
It should be appreciated that the respective modules may implement the respective functions by executing the computer program.
In a third aspect, the present application provides a device management apparatus comprising a processor for executing program code to cause the apparatus to implement the method of the first aspect.
In a fourth aspect, the present application provides a chip comprising at least one processor for implementing the functions referred to in the first aspect, for example, virtualizing a GPU.
In a fifth aspect, the present application provides a computing device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the method of the first aspect described above when executing the computer program.
In a sixth aspect, the present application provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor causes the processor to implement the method of the first aspect described above.
Seventh method, the present application provides a computer program product comprising a computer program, which when run implements the method of the first aspect described above.
According to the scheme, on one hand, according to the GPUs which can be called by each container, the virtualized vGPU examples are provided for the containers through a virtualization technology, so that each container can only access the GPUs which can be called by the container, and inter-container GPU isolation can be ensured, and on the other hand, N GPUs are respectively mounted on each container, so that when the container is started, preset links between the GPUs mounted on different containers are prevented from being invalid, namely, the preset links between the GPUs are prevented from being blocked, and therefore, the GPUs are allowed to communicate by using high-speed interconnection. Because the communication efficiency of the shared memory or network transmission is far lower than that of the preset link, the communication speed can be greatly improved, the execution efficiency is improved, and good communication performance is ensured.
Drawings
FIG. 1 is a schematic diagram of communication between GPUs provided by an embodiment of the present application;
FIG. 2 is a schematic flow chart of a device management method provided by an embodiment of the present application;
FIG. 3 is another schematic diagram of communication between GPUs provided by an embodiment of the present application;
FIG. 4 is a schematic block diagram of a device management apparatus according to an embodiment of the present application;
Fig. 5 is another schematic block diagram of a device management apparatus provided in an embodiment of the present application.
Detailed Description
The technical scheme of the application will be described below with reference to the accompanying drawings.
The technical scheme provided by the application can be applied to the fields of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) and deep learning (DEEP LEARNING, DL). Among them, the AI field is a new technical science for researching, developing theories, methods, techniques and application systems for simulating, extending and expanding human intelligence. DL is a new research direction in the field of machine learning (MACHINE LEARNING) that was introduced to machine learning to bring it closer to the original AI goal.
Fig. 1 is a schematic diagram of communication between GPUs, which is suitable for use in embodiments of the present application.
In the communication scenario shown in fig. 1, two containers, container 0 and container 1, respectively, are activated. Each container contains a plurality of work processes (workers) and corresponding communication libraries. Wherein container 0 contains work process 0, work process 1, and work process 2, and container 1 contains work process 3 and work process 4. It should be appreciated that multiple work processes may constitute a collective process group, completing the same task in parallel. Each container has a GPU mounted thereon. As shown in fig. 1, GPU0, GPU1 and GPU2 are mounted on container 0, and GPU3 and GPU4 are mounted on container 1. Each GPU may be provided for use by a corresponding work process. For example, GPU0 may be provided for use by work process 0, GPU1 may be provided for use by work process 1, GPU2 may be provided for use by work process 2, GPU3 may be provided for use by work process 3, and GPU4 may be provided for use by work process 4. When a GPU is used by a work process in a container, the work process may use the GPU by way of a call, so that the work process using the GPU may need to communicate sets frequently.
It should be appreciated that a container may be a collection of processes that isolate other resources of the system, with its own independent view of resources.
It should also be appreciated that mounting is the process of exposing certain GPUs of a host to a container so that the container can access and use those GPUs.
It should also be appreciated that aggregate communications are a group of inter-process communications. Collective communication differs from point-to-point communication in that it requires all within a particular group to engage in simultaneous communication, there may be one-to-many, many-to-one, many-to-many. The communication library referred to in the present application is a communication library for collective communication. The related content of the aggregate communication and the communication library can refer to the prior art, and will not be described herein.
Currently, when GPUs are mounted on the container 0 and the container 1, the system or the user configures GPUs that can be used by the container 0 and the container 1 for each, for example, GPU0, GPU1 and GPU2 are configured for the container 0, and GPU3 and GPU4 are configured for the container 1. And (3) when the GPU runs, GPU0 to GPU2 are mounted on the container 0, and GPU3 and GPU4 are mounted on the container 1. Then, container 0 may access GPU0 through GPU2 and container 1 may access GPU3 and GPU4. Since container 0 and container 1 are isolated from each other, it may be guaranteed at the GPU runtime level that container 0 cannot access GPU3 and GPU4 inside container 1, and container 1 cannot access GPU0 through GPU2 inside container 0.
However, researchers have found that since container 0 and container 1 are isolated from each other, GPU0, GPU1, and GPU2 in container 0 cannot communicate with GPU3, GPU4 in container 1at high speed using GPU high speed interconnection techniques, i.e., cannot achieve high speed communication of GPUs between containers. Although data transfer between the containers may be implemented by means of shared memory, network transfer, or the like, GPU1 and GPU4 may be interconnected by means of shared memory. But this may affect overall performance. For example, the data needs to be transferred through the system main memory in a shared memory mode, and the data is copied for multiple times, so that the communication efficiency is low, the communication performance is poor, and the extensibility of training is limited.
In view of the above, the present application provides a device management method, on one hand, according to the GPUs that each container can call, the virtualized vGPU instances are provided to the containers through the virtualization technology, so that each container can only access the GPUs that it can call, and the inter-container GPU isolation can be ensured, and on the other hand, N GPUs are respectively mounted on each container, so as to avoid invalidating preset links between GPUs mounted on different containers when the container is started, that is, to avoid blocking the preset links between GPUs, and to allow the GPUs to communicate with each other using high-speed interconnection. If the preset link is designed as a high-speed communication link for communication between GPUs, the communication efficiency of shared memory or network transmission is far lower than that of the high-speed communication link, so that the communication efficiency can be greatly improved, the execution efficiency can be improved, and the overall performance can be improved.
The device management method provided by the embodiment of the application is described in detail below with reference to the accompanying drawings.
Referring to fig. 2, fig. 2 is a schematic flowchart of a device management method according to an embodiment of the present application. The method 200 shown in FIG. 2 may be applied to a CPU or to system software on a CPU. The system software may include an operating system and a resource scheduling system, wherein the resource scheduling system is operable for resource scheduling of the GPU.
The method 200 shown in fig. 2 may include steps 201 to 203. The steps in the method 200 shown in fig. 2 are described in detail below using the method 200 as applied to a CPU.
In step 201, N GPUs are mounted on each of the plurality of containers, where a preset link exists between the N GPUs, and N is an integer greater than 1.
Wherein the preset link may be used for communication between GPUs. In the embodiment of the application, the preset link can be designed as a high-speed communication link for communication between GPUs, and the transmission bandwidth of the high-speed communication link is far higher than that of a common network, so that higher communication speed can be provided. By way of example and not limitation, the high-speed communication link may be NVLink of Injeida, or may also be a high-speed communication link provided by other GPU vendors for inter-GPU communications, including but not limited to.
When a CPU starts a container, N GPUs, where N GPUs refer to all GPUs configured for multiple containers, may be mounted on each container. The plurality of containers may be, for example, a plurality of containers for the same tenant to perform the same task. Because all the GPUs are mounted on the containers, each container can not invalidate the high-speed communication links among the mounted different GPUs when being started, and the high-speed communication links among the GPUs are not blocked, so that the high-speed communication links can be used for data transmission among the GPUs.
As shown in fig. 3, the CPU starts two containers, container 0 and container 1, respectively, each of which container 0 and container 1 mounts GPUs 0 to 3. No isolation exists between GPU0, GPU1, and GPU3, and there is a high speed communication link between GPUs, whether for container 0 or container 1. Thus, high-speed interconnections between GPU0, GPU1, GPU2, and GPU3 may be achieved.
Alternatively, the GPU mounted by each of the plurality of containers may be configured by the system or by the user.
Specifically, the mountable GPU may be configured for the container by the system or a user. The system may be a resource scheduling system on the CPU that may be used to configure the mountable GPU for the container. It should be appreciated that the resource scheduling system may also be provided on the system software of the CPU.
The following exemplarily gives a specific procedure for configuring a mountable GPU for a container.
For example, a resource scheduling system or user may uniformly configure a mountable GPU for multiple containers. For example, both container 0 and container 1 may be configured by the resource scheduling system on the CPU to mount GPU0, GPU1, GPU2, and GPU3. Thus, the CPU mounts GPU0 to GPU3 for both container 0 and container 1 according to the configured GPUs.
Step 202, virtualizing the GPUs that each container can call to obtain one or more vGPU instances corresponding to each container.
The GPUs that each container is able to invoke, i.e., the GPUs that each container is actually able to use. For example, container 0 in FIG. 3 may call GPU0, GPU1, and GPU2, and container 1 may call GPU3. The GPUs that each container can call can be configured by a mapping relationship. The mapping relationship may be a mapping relationship between a container manually configured by a user or a resource scheduling system and the GPU. The mapping relationship may be configured separately for each container, and the mapping relationship configured for each container is used to indicate the GPU that can be invoked by the container, or may be configured uniformly for all containers, and the mapping relationship may be used to indicate the GPU that can be invoked by each container in the plurality of containers.
When the mapping relationship is configured separately for each container, the resource scheduling system or the user may generate one mapping relationship for each container, respectively. For example, the resource scheduling system generates a mapping relationship #1 for the GPUs that can be called by the container 0, where the mapping relationship #1 indicates that the GPUs that can be called by the container 0 are GPU0, GPU1 and GPU2, and the resource scheduling system also generates a mapping relationship #2 for the GPUs that can be called by the container 1, where the mapping relationship #2 indicates that the GPUs that can be called by the container 1 are GPU3. It should be understood that the mapping relationship #1 and the mapping relationship #2 are specific examples of the mapping relationship, respectively. When the mapping relationship is configured individually for each container, the mapping relationship configured for each container is different from each other.
When uniformly configured for all containers, the resource scheduling system or user uniformly generates a mapping relationship for all containers. For example, the mapping relationship indicates that GPUs that container 0 can call are GPU0, GPU1, and GPU3, and GPUs that container 1 can call are GPU3. In other words, the map is the complete set of the map #1 and the map # 2.
After knowing the callable GPUs of each container according to the mapping relation, the CPU can determine the callable GPUs from the N GPUs mounted in each container, and virtualize the callable GPUs.
It should be understood that GPU virtualization refers to packaging a single GPU device into several logical vGPU instances for concurrent use by different work processes.
Optionally, each container of the plurality of containers includes one or more work processes, each work process being provided with one or more vGPU instances.
When the GPU is virtualized, whether to virtualize one vGPU instance or a plurality of vGPU instances can be determined according to the configuration situation of the GPU. When the configuration of the GPU is low, the GPU can be virtualized into a vGPU instance when the requirements of a plurality of working processes cannot be met at the same time.
For example, in fig. 3, if the CPU learns that the GPUs callable by the container 0 are GPU0, GPU1 and GPU2 according to the mapping relationship, the GPUs callable by the container 1 are GPU3. Although GPU0 through GPU3 are mounted in both container 0 and container 1, for container 0, the CPU can virtualize GPU0 through GPU2 without virtualizing GPU3, and for container 1, the CPU can virtualize GPU3 without virtualizing GPU0 through GPU 2. Assuming that the configurations of GPU0, GPU1 and GPU2 are low, and cannot meet the requirements of multiple working processes at the same time, GPU0, GPU1 and GPU2 can be virtualized into vGPU instance-0, vGPU instance-1 and vGPU instance-2 respectively, while assuming that the configuration of GPU3 is high, and can meet the requirements of multiple working processes at the same time, GPU3 can be virtualized into vGPU instance-3 and vGPU instance-4. Thus, work process 0 in container 0 may be provided with vGPU instance-0, work process 1 may be provided with vGPU instance-1, work process 2 may be provided with vGPU instance-2, and work process 3 in container 1 may be provided with vGPU instance-3, and work process 4 may be provided with vGPU instance-4.
It can be seen that while the GPUs are not isolated from each other from the container, the container's calls to the resources are still isolated from each other by using virtualization techniques.
The vGPU instance may be obtained by virtualizing each GPU based on the mapping relationship described above when the vGPU is running. The vGPU runtime can be understood as software for virtualizing the GPU and managing the virtualized vGPU instances.
One possible implementation is to inject a vGPU runtime for each container, which is used to virtualize the callable GPU.
For example, at the time of starting the containers, the CPU may inject a vGPU runtime into each container, and the CPU may virtualize the GPU that each container can call into one or more vGPU instances by invoking the vGPU runtime in each container according to the above-described mapping relationship. In particular implementations, the CPU may inject vGPU runtime into each container by mounting the host volume.
Optionally, the method further comprises providing the above mapping relationship to the vGPU runtime of each container, the mapping relationship being used to indicate the GPU that each container can invoke.
For example, after generating the mapping relationship, the resource scheduling system in the CPU may provide the mapping relationship to the vGPU runtime, and the vGPU runtime virtualizes the GPU that the container may call according to the mapping relationship. The resource scheduling system can provide the mapping relation to the vGPU operation time in the forms of configuration files, environment variables, command line parameters and the like.
As described above, the mapping relationship may be configured individually for each container or may be configured uniformly for all containers. When the configuration is unified for all containers, the resource scheduling system provides the GPU which can be called by each container to each vGPU, namely, the content of the mapping relation which can be called by the container which belongs to each vGPU in the running process of the vGPU is the same, and when the configuration is unified for all containers, the resource scheduling system provides the GPU which can be called by each container to each vGPU in the running process of the vGPU, namely, the content of the mapping relation which can be called by each container is the same, and the GPU which can be called by the container which belongs to the vGPU is searched in the mapping relation in the running process of the vGPU.
It should be appreciated that the mapping relationship may also be provided by the user.
And 203, providing the virtualized vGPU instance for a corresponding container.
After the CPU virtualizes the GPU that the container can call by calling the vGPU runtime, the container may use the vGPU instance in the following manner. It should be appreciated that the use of the vGPU instance by the container may specifically be the use of the vGPU instance by a work process in the container.
It should be appreciated that since the vGPU instances are virtualized by the GPU, the use of the vGPU instances corresponding to the GPUs by the work processes in the container is also equivalent to the use of the GPUs by the work processes in the container.
Optionally, step 203 may specifically include hijacking calls to the first application program interface API by the vGPU runtime injected into each container and providing a second API.
The first API may be an API provided by a GPU vendor, and may specifically be a GPU user mode API or a GPU kernel driver API. The second API is an API provided by the vGPU runtime and having exactly the same appearance as the name, appearance, etc. of the API provided by the GPU vendor, and is used to call the vGPU instance in each container.
It should be appreciated that hijacking (hijack) a call to an API may be understood as modifying the entry of the original API to jump to another API. In the embodiment of the application, the call to the first API is jumped to the second API by hijacking the call to the first API and providing the second PAI. Specifically, when a work process in a container makes a call to a first API, the vGPU runtime can block the call to the first API by the work process and provide a second API to the work process.
Specifically, when the GPU is to be used by the work process in the container, an API provided by the manufacturer, i.e., the first API, is usually called, at this time, the CPU controls the vGPU runtime in the container to block the call of the work process to the first API, and the vGPU runtime provides the API for calling the vGPU instance in the container, i.e., the second API, to the work process. Because the appearance of the first API is completely consistent with that of the second API, the working process in the container can be induced to call the second API, and the corresponding vGPU can be used by calling the second API. When the vGPU runs and hives the APIs provided by manufacturers, the vGPU can hijack the APIs in the position of the user state of the GPU, and the vGPU can hijack the APIs in the position of the driving APIs of the GPU core.
Optionally, the vGPU runtime provides the functionality to inject the mapping or modify GPU runtime environment variables.
The GPU runtime can be understood as software that manages the GPU. The environment variables are typically parameters in the operating system that specify the operating system operating environment. The environment variables involved in the embodiments of the present application may be originally used to describe the GPU mounted on each container, for example, "CUDE _visible_devices", "hip_visible_devices", etc. It will be appreciated that in this embodiment, the GPUs mounted on each container are the N GPUs described above. Through modification of the environment variable, what is rendered by the environment variable is a GPU that each container can call.
In one example, the vGPU runtime may inject (object) a mapping into the second API, and the work process in the container may use the corresponding vGPU instance by calling the second API.
For example, if the vGPU runtime of container 0 injects a mapping relationship that characterizes container 0 as being able to call GPU0 to GPU2 into the second API, then work process 0 may use vGPU instance-0 with a call to the second API, work process 1 may use vGPU instance-1 with a call to the second API, and work process 2 may use vGPU instance-2 with a call to the second API. That is, work process 0 may use GPU0, work process 1 may use GPU1, and work process 2 may use GPU2 through a call to the second interface. While the vGPU runtime of container 1 injects the mapping relation characterizing that container 1 is able to call GPU3 into the second API, work process 3 may use vGPU instance-3 by calling the second API, and work process 4 may use vGPU instance-4 by calling the second API. That is, work process 3 and work process 4 multiplex GPU3.
As another example, if all GPUs of the configuration are mounted on each container, and the vGPU is running, only the GPUs that the container can use are virtualized. Since all GPUs of the configuration are mounted in the container, the container may access all GPUs mounted, and for non-virtualized GPUs, there may be cases where the container bypasses the second API to access the non-virtualized GPUs. For example, although GPU0 to GPU3 are mounted in container 0 and only GPU0 to GPU2 are virtualized, container 0 may access GPU3, and may bypass the second API by an illegal way to use GPU3 that should be used by container 1. For another example, the container 1 also mounts GPU0 to GPU3, and although only GPU3 is virtualized, the container 1 may access GPU0 to GPU2, and may bypass the second API by an illegal way to use GPU3 that should be used by the container 0. If this occurs, isolation between container 0 and container 1 is not guaranteed.
Thus, to avoid this, the vGPU runtime may provide the ability to modify GPU runtime environment variables, changing the container from "accessible" to "inaccessible" to the un-virtualized GPU. For example, modifying the GPU runtime environment variables describing the GPU installed on container 0 to make GPU3 "inaccessible", container 0 "cannot use GPU3. Likewise, modifying the GPU runtime environment variables describing the GPUs mounted on container 1 to render GPU0 through GPU2 "inaccessible" renders container 1 unable to use GPU0 through GPU2.
It can be known that, although both the container 0 and the container 1 mount the GPUs 0 to 3, the container 0 and the container 1 can only use the vGPU instance corresponding to the GPU that can be called by themselves due to the isolation guarantee between the containers when the vGPU runs, the container 0 cannot use the GPU3 in the container 1, and the container 1 cannot use the GPUs 0 to 2 in the container 0. Thus, the isolation between container 0 and container 1 is guaranteed at the level of the vGPU runtime.
It was noted above that when the configuration of the GPU is high, the GPU may be virtualized into multiple vGPU instances for use by the container. And deadlock may occur when the communication library is dominated by the GPU for the entire logic. Specifically, when the communication library is dominated by the GPU and the GPU resources are tense or the GPU utilization is high, i.e. the GPU has little available resources, if multiple nodes in the container multiplex one GPU and there are interdependencies between the multiple nodes, a deadlock phenomenon is very easy to occur.
It should be understood that interdependence may refer to a signal that one work process (e.g., denoted as work process a) needs to wait for another work process (e.g., denoted as work process b) to continue running on the GPU, while work process b needs to be scheduled after work process a releases the GPU's resources. However, the work process b can only run on the GPU after being scheduled, and signals the work process a, and the mode of the GPU long-resident persistence kernel determines that the work process a needs to release resources after completing tasks. In this way, the working process a waits for the signal of the working process b, the working process b waits for the working process a to release the resource, and the working process a waits for each other, so that the deadlock phenomenon occurs. The long-resident persistence kernel mode may be specifically expressed in that a work process running on the GPU needs to release resources after execution ends.
It should also be appreciated that deadlock does not occur when the GPU has available resources.
For example, as shown in fig. 3, when the vGPU in the container 1 runs, the GPU3 is virtualized into the vGPU instance-3, and the vGPU instance-4 is provided for the working process 3 and the working process 4 to use respectively, so that the working process 3 and the working process 4 use the GPU3 essentially. Assuming that the work process 3 is first run on the GPU3, a signal of the work process 4 is required to continue running when the work process 3 runs to a certain node on the GPU3 for the corresponding work process 3. However, if the resources of GPU3 are already full at this time, work process 4 cannot run on GPU3. Since the work process 4 cannot be operated, a signal for continuing the operation of the work process 3 cannot be given, so that the work process 3 is in a waiting state. For the work process 4, only when the GPU3 releases the resources occupied by the work process 3, the work process 4 can run on the GPU3. However, the work process 3 waits for the signal of the work process 4, and the GPU3 cannot release the resource, so that the work process 4 is always in a waiting state. Therefore, the work process 3 and the work process 4 wait for each other, and a deadlock phenomenon occurs.
Thus, to avoid the phenomenon that multiplexing a plurality of work processes with one GPU may cause deadlock, the method 200 may further include:
scheduling a work process based on control logic in the communication library such that the work process invokes resources in the CPU for computation in response to the scheduling of the CPU.
Specifically, the control logic of the communication library is offloaded from the GPU to the CPU, and the operating system of the CPU may be responsible for scheduling the work process based on the control logic of the communication library. Because the CPU runs with the operating system, the operating system can ensure that the resource is less likely to be exhausted. Thus, interdependent work processes may be scheduled on the CPU for computation. On the other hand, unlike the long-resident persistent kernel mode of the GPU, the CPU controls the communication logic so as not to load complex logic, for example, a working process depending on external conditions, and the operation can be finished when the GPU has available resources, so that the problem of deadlock is solved when the GPU occupies the resources of the GPU but waits for another working process which is not scheduled. Instead, the CPU may schedule the interdependent work processes in turn as resources are exhausted. Therefore, the deadlock phenomenon caused by mutual waiting of the mutually-dependent working processes can be avoided.
In one implementation, all work processes are handed to the CPU for processing. That is, both interdependent and independent work processes may be scheduled by the CPU.
For interdependent work processes, the CPU may employ a round robin scheduling mechanism. For example, assuming that the working process 3 and the working process 4 in the container 1 are mutually dependent, the operating system of the CPU may schedule the working process 3 first, and when the working process 3 runs to a certain node, a signal of the working process 4 is required to continue running and the resources of the CPU are currently occupied, the CPU may schedule the working process 3, schedule the working process 4, and let the working process 4 run. At this time, the scheduled work process 4 can give a signal to the work process 3 to continue running, and the CPU can schedule the work process 4, schedule the work process 3, and allow the work process 3 to continue running. The cycle is performed in such a way, so that the work process 3 and the work process 4 are scheduled to run in turn, and the problem that resources occupied by a certain work process in the GPU are not released can be avoided, namely the deadlock phenomenon can be avoided.
For independent working processes, the CPU can schedule according to the occupation condition of resources. For example, assuming that the work process 3 and the work process 4 in the container 1 are independent, the operating system of the CPU may schedule the work process 3 first, so that the work process 3 runs. If the CPU has remaining resources, the work process 4 may continue to be scheduled. If the CPU has no residual resources, the CPU can schedule the working process 3, schedule the working process 4, or schedule the working process 4 after the working process 3 runs to release the resources.
In another implementation, the independent work processes are handed to the CPU for processing and the independent work processes are handed to the GPU for processing. The manner in which the CPU processes the interdependent work processes is the same as that previously achievable, and will not be described in detail here. For the mode of processing independent working processes by the GPU, as the GPU does not have the capability of alternately scheduling the working processes, the former working process can be operated, and after resources are released, the latter working process can be operated.
It should also be appreciated that the control logic is determined by the CPU, and the application is not limited to specific executors such as data transfer, and may be a direct memory access (direct memory access, DMA) engine, a network card, a GPU program, or the CPU itself, etc.
According to the scheme, on one hand, according to the GPUs which can be called by each container, the virtualized vGPU examples are provided for the containers through a virtualization technology, so that each container can only access the GPUs which can be called by the container, and the inter-container GPU isolation can be ensured, and on the other hand, N GPUs are respectively mounted on each container, so that when the container is started, the preset links between the GPUs mounted on different containers are prevented from being invalid, namely, the preset links between the GPUs are prevented from being blocked, and communication is allowed between the GPUs by using the preset links. Since the preset link between GPUs may be designed as a high-speed communication link, communication between GPUs is allowed using the high-speed communication link. Because the communication efficiency of the shared memory or network transmission is far lower than that of the high-speed link, the communication speed can be greatly improved, the execution efficiency is improved, and good communication performance is ensured. In addition, by unloading the control logic of the aggregate communication library from the GPU to the CPU, the phenomenon that a plurality of working processes multiplex the same GPU and deadlock is likely to occur is avoided.
The method provided by the embodiment of the application is described in detail above with reference to fig. 2 to 3. The following describes in detail the apparatus provided in the embodiment of the present application with reference to fig. 4 to 5.
Fig. 4 is a schematic block diagram of an apparatus provided by an embodiment of the present application. As shown in fig. 4, the apparatus 400 may include a control module 410 and a virtualization module 420. The modules in the apparatus 400 may be used to implement the corresponding flow of the CPU in the method 200 shown in fig. 2. For example, control module 410 may be used to perform steps 201 and 203 in method 200 and virtualization module 420 may be used to perform step 202 in method 200.
Specifically, the control module 410 may be configured to mount N GPUs on each container of the plurality of containers, where a preset link exists between the N GPUs, where N is an integer greater than 1, the virtualization module 420 may be configured to virtualize the GPUs that can be invoked by each container to obtain one or more vGPU instances corresponding to each container, and the control module 410 is further configured to provide the virtualized vGPU instances to the corresponding containers for use.
Optionally, the control module 410 may be further configured to inject a vGPU runtime for each container, where the vGPU runtime injected into each container is used to virtualize the callable GPUs.
Alternatively, the control module 410 may be specifically configured to hijack calls to a first application program interface API by the vGPU runtime injected into each container, and provide a second API for calling the vGPU instance within each container, where the first API is a GPU custom API or a GPU kernel driver API provided by a GPU vendor.
Optionally, the vGPU runtime provides the functionality to inject mappings or modify GPU runtime environment variables.
Optionally, the control module 410 may be further configured to provide a mapping relationship to the vGPU runtime of each container, where the mapping relationship is used to indicate GPUs that each container can invoke.
Optionally, the GPU mounted by each container of the plurality of containers is configured by the system or by a user.
Optionally, each container of the plurality of containers includes one or more work processes, each work process being provided with one or more vGPU instances.
Optionally, the control module 410 may be further configured to schedule a work process based on control logic in the communication library, such that the work process invokes resources in the CPU for computation in response to the scheduling of the CPU.
It should be understood that the division of the modules in the embodiment of the present application is illustrative, and is merely a logic function division, and other division manners may be implemented in practice. In addition, each functional module in the embodiments of the present application may be integrated in one processor, or may exist alone physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules.
Fig. 5 is another schematic block diagram of an apparatus provided by an embodiment of the present application. The apparatus 500 may be used to implement the functions of the CPU in the method 200 described above. The apparatus 500 may be a system-on-chip. In the embodiment of the application, the chip system can be formed by a chip, and can also comprise the chip and other discrete devices.
As shown in fig. 5, the apparatus 500 may include at least one processor 510 for implementing the CPU functions in the method 200 provided in the embodiment of the present application.
Illustratively, when the apparatus 500 is used to implement the functions of the CPU in the method 200 provided by the embodiment of the present application, the processor 510 may be configured to mount N GPUs on each container of the plurality of containers, where N is an integer greater than 1 and a preset link exists between the N GPUs, virtualize the GPUs that can be invoked by each container to obtain one or more vGPU instances corresponding to each container, and provide the vcpu instances obtained by the virtualization to the corresponding container for use. Reference is made specifically to the detailed description in the method examples, and details are not described here.
The apparatus 500 may also include at least one memory 520 for storing program instructions and/or data. Memory 520 is coupled to processor 510. The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units, or modules, which may be in electrical, mechanical, or other forms for information interaction between the devices, units, or modules. Processor 510 may operate in conjunction with memory 520. Processor 510 may execute program instructions stored in memory 520. At least one of the at least one memory may be included in the processor.
The apparatus 500 may also include a communication interface 530 for communicating with other devices over a transmission medium, such that the apparatus 500 may communicate with other devices. The communication interface 530 may be, for example, a transceiver, an interface, a bus, a circuit, or a device capable of implementing a transceiver function. Processor 510 may utilize communication interface 530 to transceive data and/or information and may be used to implement methods performed by the CPU in the corresponding embodiment of fig. 2.
The specific connection medium between the processor 510, the memory 520, and the communication interface 530 is not limited to the above embodiments of the present application. The embodiment of the present application is illustrated in fig. 5 as a bus connection between processor 510, memory 520, and communication interface 530. The bus is shown in bold lines in fig. 5, and the manner in which other components are connected is merely illustrative and not limiting. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in fig. 5, but not only one bus or one type of bus.
It should be appreciated that the processor in embodiments of the present application may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be implemented by integrated logic circuits of hardware in a processor or instructions in software form. The processor may be a general purpose processor, a digital signal processor (DIGITAL SIGNAL processor, DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), a field programmable gate array (field programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable ROM (erasable PROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double DATA RATE SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
The application also provides a chip comprising at least one processor for implementing the functions involved in the CPU in the embodiment shown in fig. 2.
In one possible design, the chip may further include a memory for holding program instructions and data, the memory being located within the processor or external to the processor.
The application also provides a computing device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, which processor implements the method of the embodiment shown in fig. 2 when executing the computer program.
The present application also provides a computer-readable storage medium storing a computer program (which may also be referred to as code, or instructions). The computer program, when executed, causes the computer to perform the method of the embodiment shown in fig. 2.
The application also provides a computer program product comprising a computer program which, when allowed, implements the method of the embodiment shown in fig. 2.
The terms "unit," "module," and the like as used in this specification may be used to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution.
Those of ordinary skill in the art will appreciate that the various illustrative logical blocks (illustrative logical block) and steps (steps) described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. In the several embodiments provided by the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
In the above-described embodiments, the functions of the respective functional units may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions (programs). When the computer program instructions (program) are loaded and executed on a computer, the processes or functions according to embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL), or wireless (e.g., infrared, wireless, microwave, etc.), the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc., that contains an integration of one or more available media, the available media may be magnetic media, (e.g., floppy disk, hard disk, magnetic tape), optical media (e.g., digital video disc (digital video disc, DVD), or semiconductor media (e.g., solid state hard disk (solid STATE DISK, SSD)), or the like.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present application. The storage medium includes various media capable of storing program codes such as a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk or an optical disk.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (13)

1.一种设备管理方法,其特征在于,所述方法包括:1. A device management method, characterized in that the method comprises: 将全部容器对应的N个图形处理单元GPU挂载在多个容器中的每个容器上,所述N个GPU之间存在预设链路,GPU之间使用高速互连进行通信,每个容器对应的GPU相同,N为大于1的整数;Mounting N graphics processing units (GPUs) corresponding to all containers on each of the multiple containers, wherein a preset link exists between the N GPUs, and the GPUs communicate with each other using a high-speed interconnect. Each container corresponds to the same GPU, and N is an integer greater than 1; 对每个容器能够调用的GPU进行虚拟化,得到对应于所述每个容器的一个或多个vGPU实例;Virtualizing the GPU that can be called by each container to obtain one or more vGPU instances corresponding to each container; 将对各GPU虚拟化得到的vGPU实例提供给对应的不同的容器使用;所述vGPU运行时提供注入映射关系的功能或修改GPU运行时环境变量的功能。The vGPU instances obtained by virtualizing each GPU are provided to corresponding different containers for use; the vGPU runtime provides the function of injecting mapping relationships or modifying GPU runtime environment variables. 2.如权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, further comprising: 对每个容器注入vGPU运行时,被注入所述每个容器的vGPU运行时用于对能够调用的GPU进行虚拟化。A vGPU runtime is injected into each container, and the vGPU runtime injected into each container is used to virtualize a callable GPU. 3.如权利要求2所述的方法,其特征在于,所述将虚拟化得到的vGPU实例提供给对应的容器使用,包括:3. The method according to claim 2, wherein providing the virtualized vGPU instance to the corresponding container for use comprises: 通过被注入所述每个容器的vGPU运行时劫持对第一应用程序接口API的调用,并提供第二API,所述第一API为GPU厂商提供的GPU用户态API或GPU内核驱动API,所述第二API用于调用所述每个容器内的vGPU实例。The vGPU runtime injected into each container hijacks the call to the first application program interface API and provides a second API, where the first API is a GPU user-mode API or a GPU kernel driver API provided by the GPU manufacturer, and the second API is used to call the vGPU instance in each container. 4.如权利要求2或3所述的方法,其特征在于,所述方法还包括:4. The method according to claim 2 or 3, further comprising: 向所述每个容器的vGPU运行时提供映射关系,所述映射关系用于指示所述每个容器能够调用的GPU。A mapping relationship is provided to the vGPU runtime of each container, where the mapping relationship is used to indicate the GPU that can be called by each container. 5.如权利要求1所述的方法,其特征在于,所述多个容器中每个容器挂载的GPU由系统配置或由用户配置。5. The method of claim 1, wherein the GPU mounted on each of the multiple containers is configured by the system or by the user. 6.如权利要求1所述的方法,其特征在于,所述多个容器中的每个容器包括一个或多个工作进程,每个工作进程被提供一个或多个vGPU实例。6 . The method of claim 1 , wherein each of the plurality of containers comprises one or more worker processes, and each worker process is provided with one or more vGPU instances. 7.如权利要求6所述的方法,其特征在于,应用于中央处理单元CPU,所述方法还包括:7. The method according to claim 6, characterized in that it is applied to a central processing unit (CPU), and further comprises: 基于通信库中的控制逻辑调度工作进程,以使得所述工作进程响应于所述CPU的调度,调用所述CPU中的资源进行计算。The work process is scheduled based on the control logic in the communication library, so that the work process responds to the scheduling of the CPU and calls the resources in the CPU for calculation. 8.一种设备管理装置,其特征在于,包括用于执行如权利要求1至7中任一项所述的方法的模块。8. A device management apparatus, characterized by comprising a module for executing the method according to any one of claims 1 to 7. 9.一种设备管理装置,其特征在于,包括处理器,所述处理器用于执行程序代码,以使得所述装置实现如权利要求1至7中任一项所述的方法。9. A device management apparatus, comprising a processor, wherein the processor is configured to execute program code so that the apparatus implements the method according to any one of claims 1 to 7. 10.一种芯片,其特征在于,包括:至少一个处理器,用于实现如权利要求1至7中任一项所述的方法中所涉及的功能。10. A chip, comprising: at least one processor, configured to implement the functions involved in the method according to any one of claims 1 to 7. 11.一种计算设备,其特征在于,包括:处理器、存储器及存储在所述存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至7中任一项所述的方法。11. A computing device, comprising: a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method according to any one of claims 1 to 7 when executing the computer program. 12.一种计算机程序产品,其特征在于,包括计算机程序,当所述计算机程序被运行时,实现如权利要求1至7中任一项所述的方法。12. A computer program product, comprising a computer program, wherein when the computer program is executed, the method according to any one of claims 1 to 7 is implemented. 13.一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被执行时,实现如权利要求1至7中任一项所述的方法。13. A computer-readable storage medium storing a computer program, wherein when the computer program is executed, the method according to any one of claims 1 to 7 is implemented.
CN202210294026.6A 2022-03-23 2022-03-23 Device management method and device Active CN114625482B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210294026.6A CN114625482B (en) 2022-03-23 2022-03-23 Device management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210294026.6A CN114625482B (en) 2022-03-23 2022-03-23 Device management method and device

Publications (2)

Publication Number Publication Date
CN114625482A CN114625482A (en) 2022-06-14
CN114625482B true CN114625482B (en) 2025-09-23

Family

ID=81905156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210294026.6A Active CN114625482B (en) 2022-03-23 2022-03-23 Device management method and device

Country Status (1)

Country Link
CN (1) CN114625482B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116578413B (en) * 2023-04-26 2024-04-12 中国人民解放军92942部队 Signal-level simulation model clouding method based on cloud+end architecture
CN117234741B (en) * 2023-11-14 2024-02-20 苏州元脑智能科技有限公司 Resource management and scheduling method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111223036A (en) * 2019-12-29 2020-06-02 广东浪潮大数据研究有限公司 GPU virtualization sharing method and device, electronic equipment and storage medium
CN111274041A (en) * 2020-02-24 2020-06-12 北京达佳互联信息技术有限公司 Graphics processor mounting method and device, electronic equipment and storage medium
CN111913794A (en) * 2020-08-04 2020-11-10 北京百度网讯科技有限公司 Method, apparatus, electronic device, and readable storage medium for sharing GPUs

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928109B2 (en) * 2012-05-09 2018-03-27 Nvidia Corporation Method and system for processing nested stream events
CN104216783B (en) * 2014-08-20 2017-07-11 上海交通大学 Virtual GPU resource autonomous management and control method in cloud game
US10325343B1 (en) * 2017-08-04 2019-06-18 EMC IP Holding Company LLC Topology aware grouping and provisioning of GPU resources in GPU-as-a-Service platform
CN108279979B (en) * 2018-01-19 2021-02-19 聚好看科技股份有限公司 Method and device for binding CPU for application program container
CN110764901B (en) * 2019-09-17 2021-02-19 创新先进技术有限公司 Data processing method based on GPU (graphics processing Unit) resources, electronic equipment and system
US11182196B2 (en) * 2019-11-13 2021-11-23 Vmware, Inc. Unified resource management for containers and virtual machines
CN113986466A (en) * 2021-11-01 2022-01-28 北京计算机技术及应用研究所 A GPU virtualization system and method for cloud computing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111223036A (en) * 2019-12-29 2020-06-02 广东浪潮大数据研究有限公司 GPU virtualization sharing method and device, electronic equipment and storage medium
CN111274041A (en) * 2020-02-24 2020-06-12 北京达佳互联信息技术有限公司 Graphics processor mounting method and device, electronic equipment and storage medium
CN111913794A (en) * 2020-08-04 2020-11-10 北京百度网讯科技有限公司 Method, apparatus, electronic device, and readable storage medium for sharing GPUs

Also Published As

Publication number Publication date
CN114625482A (en) 2022-06-14

Similar Documents

Publication Publication Date Title
US8533735B2 (en) System for execution context isolation in response to invoking a BIOS kernel function during a driver execution environment (DXE) phase of boot-up of a computer
JP2020064678A (en) Configurable logic platform
US20050216920A1 (en) Use of a virtual machine to emulate a hardware device
CN108932154B (en) Distributed virtual machine manager
WO2016155335A1 (en) Task scheduling method and device on heterogeneous multi-core reconfigurable computing platform
CN114625482B (en) Device management method and device
JP2019530100A (en) Configurable logical platform with multiple reconfigurable regions
KR102092459B1 (en) Method and System to manage and schedule GPU memory resource in Container-based virtualized environment
KR20120098838A (en) Method and apparatus for handling an i/o operation in a virtualization environment
US9131031B2 (en) Virtual computer system, virtual computer management program, and MAC address management method
US10983847B2 (en) Dynamically loadable unikernel binaries
US10241829B2 (en) Information processing device, information processing method, recording medium, calculation processing device, calculation processing method
EP3633507B1 (en) Technologies for secure and efficient native code invocation for firmware services
US20220198017A1 (en) System and method to support smm update and telemetry in runtime for baremetal deployment
WO2023174146A1 (en) Offloading-card namespace management system and method, and input/output request processing system and method
CN106708619B (en) Resource management method and device
CN111247512B (en) Computer system for unified memory access
US12204940B2 (en) Transparent and remote kernel execution in a heterogeneous computing system
US11003618B1 (en) Out-of-band interconnect control and isolation
CN118331687A (en) User-state paravirtualized data path acceleration method, device, cluster and medium
US10691471B2 (en) Conflict resolution for strong symbols
US10922149B2 (en) System comprising a plurality of virtualization systems
CN116048827A (en) Inter-process function call method and related equipment
US11074200B2 (en) Use-after-free exploit prevention architecture
US12039339B1 (en) System configuration control through locking of control registers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant