[go: up one dir, main page]

CN120429091B - A method, device, product, server and medium for controlling an operation process - Google Patents

A method, device, product, server and medium for controlling an operation process

Info

Publication number
CN120429091B
CN120429091B CN202510919781.2A CN202510919781A CN120429091B CN 120429091 B CN120429091 B CN 120429091B CN 202510919781 A CN202510919781 A CN 202510919781A CN 120429091 B CN120429091 B CN 120429091B
Authority
CN
China
Prior art keywords
job
resource
job process
resources
controlling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510919781.2A
Other languages
Chinese (zh)
Other versions
CN120429091A (en
Inventor
王旭东
郭立民
吕文文
杜海超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Jinan data Technology Co ltd
Original Assignee
Inspur Jinan data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Jinan data Technology Co ltd filed Critical Inspur Jinan data Technology Co ltd
Priority to CN202510919781.2A priority Critical patent/CN120429091B/en
Publication of CN120429091A publication Critical patent/CN120429091A/en
Application granted granted Critical
Publication of CN120429091B publication Critical patent/CN120429091B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Multi Processors (AREA)

Abstract

The invention discloses a control method, a device, a product, a server and a medium for an operation process, and relates to the field of high-performance computing clusters. In the method, tokens for representing the used resources are distributed for each process, the number of the tokens is updated by combining the utilization rate of the resources, and the job process uses the resources according to the updated number of the tokens. Compared with the mode of directly limiting the use time of the resource when the utilization rate of the resource is large, the method provided by the invention ensures that each process is qualified to use the resource according to the token use resource by allocating the token for representing the use of the resource to each process, improves the fairness when the resource is used, and realizes the fine control of the resource by controlling the process use resource by combining the token quantity updated based on the utilization rate of the resource rather than limiting the use of the resource only by relying on the utilization rate of the resource, and further, different operation processes correspond to different contexts, thereby ensuring the mutual noninterference among the operation processes and realizing the isolation of the processes.

Description

Control method, device, product, server and medium for operation process
Technical Field
The present invention relates to the field of high performance computing clusters, and in particular, to a method, an apparatus, a product, a server, and a medium for controlling a job process.
Background
With the wide application of the internet of things, the data volume is in explosive growth. In order to meet a large amount of operation requirements, a plurality of high-performance servers are used, high-performance computing clusters are formed through high-speed network interconnection, and a large amount of data is processed in parallel at an extremely high speed.
In order to control the use of resources, the related art monitors the use condition of resources, and limits the use of resources when the use rate of the resources is high. If some processes occupy larger resources, other processes cannot use the resources, i.e. multiple processes cannot share the resources fairly.
Therefore, how to improve fairness when processes use resources when multiple processes share resources is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a control method, a device, a product, a server and a medium for an operation process, which are used for solving the problem of poor fairness when a process uses resources when a plurality of processes share the resources.
In order to solve the above technical problems, the present invention provides a control method for a job process, applied to a computing node, the method comprising:
The method comprises the steps of controlling a job process to call a preset file library, and obtaining contexts which correspond to the job processes recorded in the preset file library and are used for representing used resources, wherein in the preset file library, different job processes correspond to different contexts;
Distributing tokens for representing the used resources for the job process according to the context corresponding to the job process;
acquiring the resource utilization rate and updating the number of tokens corresponding to the job process according to the resource utilization rate;
And controlling the job process to use the resources according to the updated token quantity.
In one aspect, before the controlling the job process calls a preset file library and obtaining a context for representing the used resource corresponding to the job process recorded in the preset file library, the method further includes:
Receiving a job submitted by a user through a scheduling system, wherein the job submitted by the user comprises a resource model and a resource quantity to be used, and the resource quantity is an integer or a non-integer;
the scheduling system, after receiving the job submitted by the user, further comprises:
Selecting computing nodes meeting preset conditions according to the jobs submitted by the users, and marking the use condition of the resources after the resources on the computing nodes are allocated to the jobs, wherein the preset conditions are that the number of the resources on the computing nodes is greater than or equal to the number of the resources in the jobs submitted by the users, and the use condition of the resources comprises total use or partial use.
On the other hand, after receiving the job submitted by the user through the scheduling system, the control job process calls a preset file library, and before acquiring the context for representing the used resource corresponding to the job process recorded in the preset file library, the method further comprises:
creating a daemon and creating a namespace and a control group for characterizing resources used by the restricted job process by the daemon;
Receiving a to-be-started operation process sent by the scheduling system through a daemon process, and controlling the to-be-started operation process to be started;
Adding a job process to the namespace and the control group;
The controlling the operation process to call the preset file library comprises the following steps:
And controlling the job processes in the namespaces and the control groups to call a preset file library.
In another aspect, the controlling the job process to call a preset file library includes:
under the condition that the number of resources allocated by the operation process is detected to be an integer, the catalog of the operation process is put into an original preset file library to control the operation process to call the preset file library, wherein the original preset file library allows the operation process to directly access and manage the resources;
Under the condition that the number of resources allocated by the operation process is detected to be non-integer, modifying the files in the original preset file library to obtain a new preset file library, and placing the catalog of the operation process into the new preset file library to control the operation process to call the preset file library, wherein the new preset file library allows part of operation processes to directly access and manage the resources.
In another aspect, the obtaining the resource usage includes:
Creating a monitoring process for representing the resource utilization rate of monitoring each job process;
Acquiring a resource utilization rate history curve of the operation process through the monitoring process;
And determining average resource utilization rate according to the resource utilization rate history curve of the job process to serve as the resource utilization rate.
In another aspect, the resource is a hardware device for parallel computing, and before the token for characterizing the usage resource is allocated to the job process according to the context corresponding to the job process, the method further includes:
Acquiring the computing capacity of a hardware device for parallel computing and setting a first preset number of tokens in a token bucket according to the computing capacity of the hardware device for parallel computing;
the allocating the token for representing the use resource for the job process according to the context corresponding to the job process comprises the following steps:
if the job process is determined to prohibit submitting the calculation task to the target resource according to the context corresponding to the job process, the number of tokens which are distributed for the job process and are used for representing the used resource is 0;
If the fact that the calculation task is allowed to be submitted to the target resource is determined according to the context corresponding to the job process, a second preset number of tokens used for representing the used resource is distributed to the job process according to the context corresponding to the job process, wherein the first preset number is larger than the second preset number.
On the other hand, updating the number of tokens corresponding to the job process according to the resource usage rate includes:
Determining the current token number corresponding to the current resource utilization rate according to the preset relation between the utilization rate of the hardware equipment for parallel computation and the token number;
And eliminating the tokens of the current token number from the tokens of the first preset number to update the number of tokens corresponding to the operation process, wherein the updated number of tokens is a difference value obtained by subtracting the current token number from the first preset number.
On the other hand, controlling the job process to use the resource according to the updated token number includes:
Allowing the job process to submit the calculation task to use the resource under the condition that the updated token number is detected to be larger than a first preset value;
And under the condition that the number of the updated tokens is equal to the first preset value, prohibiting the job process from submitting the calculation task so as to prohibit the use of resources.
On the other hand, the control method of the job process further comprises:
Predicting the use trend of the hardware equipment for parallel computation through a time sequence prediction algorithm under the condition that the number of the updated tokens is detected to be a second preset value, wherein the second preset value is larger than the first preset value, and the difference value between the second preset value and the first preset value is smaller than the preset difference value;
And if the use trend is detected to be an ascending trend, prohibiting the job process from submitting the calculation task so as to prohibit the use of the resource.
On the other hand, the control method of the job process further comprises:
pre-creating a computing task for storing the congestion;
And after determining that the job process prohibits submitting the calculation task to the target resource according to the context corresponding to the job process, or after prohibiting the job process from submitting the calculation task, placing the calculation task prohibited from being submitted into a buffer instruction queue.
On the other hand, the control method of the job process further comprises:
Acquiring the situation of a calculation task in the buffer instruction queue;
If the computing task occupies the whole buffer instruction queue, suspending the operation of the job process in the central processing unit from the operating system.
On the other hand, the control method of the job process further comprises:
if the target operation process with the resource utilization rate larger than the preset utilization rate is detected, the context of the target operation process is forced to be cut out, and execution of the context of the target operation process is suspended;
and controlling the contexts of other job processes to be cut into the execution of the hardware equipment for parallel computation, wherein the other job processes are the job processes except the target job process in all the job processes.
On the other hand, the controlling the execution of the hardware device for parallel computing by the context of other job processes includes:
Acquiring the priority order of all other job processes;
and selecting the job process with the highest priority, and controlling the context of the job process with the highest priority to cut into the execution of the hardware equipment for parallel computation.
On the other hand, the control method of the job process further comprises:
after the updated token quantity is equal to 0, if the resource utilization rate of the job process which is forbidden to submit the calculation task is detected to be reduced, the token is issued to the job process which is forbidden to submit the calculation task again;
in the event that a number of tokens greater than 0 is detected, the execution of the context of the job process that prohibited the submission of the computing task is resumed.
In another aspect, the restoring prohibits execution of the context of the job process submitting the computing task includes:
if the context of the job process which is forbidden to submit the calculation task is detected to be in a pause state, the execution of the context of the job process which is forbidden to submit the calculation task on the hardware equipment for parallel calculation is resumed;
Or if the residual space exists in the instruction queue, migrating the calculation task in the buffer instruction queue to the instruction queue, wherein the instruction queue is used for storing the calculation task which is allowed to be submitted;
and if the job process migrated to the instruction queue is detected to be in a state of suspending the operation of the job process in the central processing unit in the operating system, resuming the operation of the job process in the central processing unit from the operating system.
On the other hand, the control method of the job process further comprises:
releasing resources occupied by the operation process under the condition of detecting the end of the operation process;
the daemon is controlled to release the namespace and a control group for characterizing resources that limit the use of the resources by the job process.
In order to solve the above technical problem, the present invention further provides a control device for a job process, applied to a computing node, the control device includes:
The first control module is used for controlling the operation process to call a preset file library and obtaining the context which is recorded in the preset file library and corresponds to the operation process and used for representing the used resource, wherein in the preset file library, different operation processes correspond to different contexts;
The allocation module is used for allocating tokens for representing the used resources for the job process according to the context corresponding to the job process;
the acquisition and updating module is used for acquiring the resource utilization rate and updating the number of tokens corresponding to the operation process according to the resource utilization rate;
And the second control module is used for controlling the job process to use the resources according to the updated token quantity.
In order to solve the above technical problem, the present invention also provides a computer program product, which includes a computer program/instruction that, when executed by a processor, implements the steps of the control method of the job process described above.
In order to solve the above technical problem, the present invention further provides a server, including:
A memory for storing a computer program;
and the processor is used for realizing the steps of the control method of the job process when executing the computer program.
In order to solve the above technical problem, the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the above-described job progress control method.
The method has the advantages that in the method, tokens for representing the used resources are distributed for each process, the number of the tokens is updated by combining the utilization rate of the resources, and the job process uses the resources according to the updated number of the tokens. Compared with the method for directly limiting the use time of the resources when the use rate of the resources is high in the related art, the method provided by the invention has the advantages that the token used for representing the use resources is distributed to each process, the resources are used according to the token, the condition that each process is qualified to use the resources is ensured, the fairness in the use of the resources is improved, the resources are used by combining the token quantity control process updated based on the use rate of the resources instead of limiting the use of the resources only by relying on the use rate of the resources, the fine control of the resources is realized, and the operation process is controlled to call the preset file library and acquire the context which is recorded in the preset file library and is used for representing the use resources and corresponds to the operation process before the token used for representing the use of the resources is distributed to the operation process. Compared with the mode of putting all the processes into the same context, in the method provided by the invention, in the preset file library, different operation processes correspond to different contexts, so that the operation processes are ensured not to interfere with each other, and isolation of the processes is realized.
In addition, the compute node receives jobs submitted by users through the scheduling system. The job submitted by the user comprises the type number and the quantity of the resources to be used, wherein the quantity of the resources is an integer or a non-integer. In other words, not only can the mode of requesting an integral number of resources be realized, but also non-integral number of resources can be requested and used in the method, so that the utilization rate of the resources is improved. And after receiving the job submitted by the user, the scheduling system ensures that the job can be processed according to the computing node with the number of resources selected according to the job submitted by the user being greater than or equal to the number of resources in the job submitted by the user. After the scheduling system allocates the resources on the computing nodes for the jobs, the scheduling system marks the use condition of the resources, such as full use or partial use, so that the use condition of the resources can be intuitively known.
The method comprises the steps of creating a daemon, creating a name space and a control group for representing limiting the use resources of a job process by the daemon, receiving the job process to be started sent by a scheduling system through the daemon, controlling the job process to be started to start, adding the job process into the name space and the control group, and controlling the job process in the name space and the control group to call a preset file library. Because the namespaces can provide an isolation mechanism, a control group used for representing the limiting of the resources used by the operation process can limit the resources used by the process, not only the resources are limited, but also different resources are cooperatively isolated, and a complete operation isolation system is formed. And the stability and the safety of the multi-resource environment are enhanced through the unified management of daemons.
The method comprises the steps of directly controlling a job process to call an original preset file library (allowing the job process to directly access and manage resources) under the condition that the number of resources allocated by the job process is detected to be an integer, and placing a catalog of the job process into a new preset file library to control the job process to call the preset file library under the condition that the number of the resources allocated by the job process is detected to be the integer, wherein partial job processes are allowed to directly access and manage the resources in the new preset file library. In the method, integer and non-integer resource allocation scenes are divided. For the operation of exclusive whole resource, the original preset file library is used to avoid the extra performance loss caused by hijacking, and for the operation of shared resource, the modified preset file library is started to limit the resource. This "hijack on demand" design maximizes performance while guaranteeing isolation. For computationally intensive jobs (e.g., deep learning training), no additional overhead is needed when resources are exclusively used, and efficient isolation can be realized under a shared scene, thereby achieving both performance and fairness.
The method comprises the steps of creating a monitoring process for representing the resource utilization rate of each operation process, obtaining a resource utilization rate history curve of the operation process through the monitoring process, and determining the average resource utilization rate according to the resource utilization rate history curve of the operation process to serve as the resource utilization rate. So that the obtained resource utilization rate has referential property.
And if the fact that the task is allowed to be submitted to the target resource is determined according to the context corresponding to the job process, a second preset number of tokens used for representing the used resource are distributed to the job process according to the context corresponding to the job process. Because the first preset number is larger than the second preset number, the overload on hardware equipment for parallel computation is avoided, and the reliability of the system is improved.
When the number of tokens corresponding to the operation process is updated according to the resource utilization rate, determining the current number of tokens corresponding to the current resource utilization rate according to the preset relation between the utilization rate of hardware equipment for parallel computation and the number of tokens, and eliminating the tokens of the current number of tokens from the first preset number of tokens to update the number of tokens corresponding to the operation process, wherein the updated number of tokens is a difference value obtained by subtracting the current number of tokens from the first preset number. The method and the device realize accurate determination of the number of the removed tokens, so that the job process can combine the number of the more accurate tokens to use the resources, and realize fine control of the use of the resources.
And under the condition that the number of the updated tokens is detected to be larger than a first preset value, allowing the job process to submit the calculation task, and under the condition that the number of the updated tokens is detected to be equal to the first preset value, prohibiting the job process from submitting the calculation task, so that the job process is prevented from occupying more resources.
And under the condition that the updated token quantity is detected to be a second preset value, predicting the use trend of the hardware equipment for parallel calculation through a time sequence prediction algorithm, wherein the second preset value is larger than the first preset value, the difference value between the second preset value and the first preset value is smaller than the preset difference value, and if the use trend is detected to be an ascending trend, prohibiting the operation process from submitting the calculation task so as to prohibit the use of resources. According to the method, a time sequence prediction algorithm is adopted to predict the use trend of the resources, and when the number of the tokens is a second preset value, the job process is forbidden to submit the calculation task, so that the advanced blocking of the process use resources is realized.
And after determining that the job process prohibits submitting the calculation task to the target resource according to the context corresponding to the job process, or after prohibiting the job process from submitting the calculation task, placing the calculation task prohibited from being submitted into a buffer instruction queue. Storage of blocked job processes is achieved by buffering the instruction queue. If the computing task is detected to occupy the whole buffer instruction queue, the operation of the operation process in the central processing unit is suspended from the operating system. I.e. suspending the process, ensures that no new instruction will be submitted to the resource for execution, thereby ensuring that the buffered instruction queue will not overflow.
If the target job process with the resource utilization rate larger than the preset utilization rate is detected, the context of the target job process is forced to be cut out, the execution of the context of the target job process is suspended, and the contexts of other job processes are controlled to be cut into the execution of the hardware equipment for parallel computing. On one hand, the method avoids the further increase of the utilization rate of resources by the target operation process, and on the other hand, ensures that other operation processes can use the resources, namely, realizes the dynamic and flexible adjustment of the resource allocation. In addition, according to the task process with the highest priority, the context of the task process with the highest priority is selected and controlled to be cut into the execution of the hardware equipment for parallel computation. It is ensured that the job process with high priority can be executed in time.
After the updated token quantity is equal to 0, if the resource utilization rate of the job process which is forbidden to submit the calculation task is detected to be reduced, the token is issued to the job process which is forbidden to submit the calculation task again; in the event that a number of tokens greater than 0 is detected, the execution of the context of the job process that prohibited the submission of the computing task is resumed. The system provided by the method supplements the tokens in time and eliminates the tokens in the previous way, namely, the condition of using the resources by the process is adjusted, the dynamic property adapts to common load fluctuation (such as iterative change of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) model training) in high-performance calculation, the resource overuse is prevented, the operation execution can be quickly recovered after the resource release, and the response capability of the system is improved.
And the control daemon releases the name space and the control group used for representing the limiting of the use resources of the operation process. The integrity of the resource management is ensured.
In addition, the invention also provides a control device, a computer program product, a server and a computer readable storage medium for the operation process, which have the same or corresponding technical characteristics as the control method for the operation process, and the effects are the same as the above.
Drawings
For a clearer description of embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.
FIG. 1 is a flow chart of a control method for a job process according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a resource used by a job process according to an embodiment of the present invention;
fig. 3 is an overall schematic diagram of a GPU sharing and limiting scheme based on CUDA hijacking in a high performance computing scenario according to an embodiment of the present invention;
Fig. 4 is a block diagram of a server according to another embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making any inventive effort are within the scope of the present invention.
The core of the invention is to provide a control method, a device, a product, a server and a medium for an operation process, which are used for solving the problem of poor fairness when the process uses resources when a plurality of processes share the resources.
In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. Fig. 1 is a flowchart of a control method of a job process according to an embodiment of the present invention. The method is applied to a compute node. For example, the compute node is a server including a graphics processing unit (Graphics Processing Unit, GPU). As shown in fig. 1, the method includes:
S10, controlling a job process to call a preset file library, and acquiring contexts which correspond to the job processes recorded in the preset file library and are used for representing the used resources, wherein in the preset file library, different job processes correspond to different contexts;
S11, distributing tokens for representing the used resources for the job process according to the context corresponding to the job process;
S12, acquiring the resource utilization rate and updating the number of tokens corresponding to the job process according to the resource utilization rate;
And S13, controlling the job process to use resources according to the updated token quantity.
A job in the present invention refers to a computational task that runs in a high performance computing environment, typically involving large-scale data processing, complex numerical simulation, or parallel computing. The method solves the complex computing problems in the fields of science, engineering, data analysis and the like by utilizing the computing resources of a high-performance computing cluster (comprising a multi-core central processing unit (Central Processing Unit, CPU), a GPU accelerator or other special hardware).
In order to acquire the job, before the control job process calls the preset file library and acquires the context which corresponds to the job process and is used for representing the used resource and is recorded in the preset file library, the method further comprises the following steps:
And receiving the job submitted by the user through a scheduling system, wherein the job submitted by the user comprises the type and the quantity of the resources to be used, and the quantity of the resources is an integer or a non-integer.
In a method for controlling a job process to use resources by using MIG (Multi-Instance GPU), a hardware level Multi-Instance technique is adopted to divide a single block of resources (such as GPU) into at most 7 independent instances, each having independent video memory and computing resources, supporting Multi-user or Multi-job sharing. MIG realizes isolation through hardware segmentation, granularity is fixed (preset example sizes are 1/7 and 2/7), and flexible allocation of non-integers (such as 0.5 GPU) cannot be supported. For the job process which does not occupy the whole resource, the resource waste is caused. In order to reduce the waste of resources. In the invention, after receiving the job submitted by the user, the scheduling system further comprises:
selecting computing nodes meeting preset conditions according to the jobs submitted by the users, and marking the use condition of the resources after the resources on the computing nodes are allocated for the jobs, wherein the preset conditions are that the number of the resources on the computing nodes is greater than or equal to the number of the resources in the jobs submitted by the users, and the use condition of the resources comprises total use or partial use.
In the method, after receiving the job submitted by the user, the scheduling system selects the computing nodes with the number of resources greater than or equal to the number of resources in the job submitted by the user according to the job submitted by the user, so that the job can be processed. And after the scheduling system allocates the resources on the computing nodes for the jobs, the scheduling system marks the use condition of the resources, such as full use or partial use, so that the use condition of the resources can be intuitively known.
In order to avoid interference among a plurality of job processes, in an implementation, after receiving a job submitted by a user through a scheduling system, before controlling the job process to call a preset file library and obtaining a context for representing a used resource corresponding to the job process recorded in the preset file library, the method further comprises:
creating a daemon and creating a Namespace (e.g., namespace) and a control group (e.g., cgroup) for characterizing resources that limit the use of the resource by the job process by the daemon;
Receiving an operation process to be started, which is sent by a scheduling system, through a daemon process, and controlling the operation process to be started;
The job process is added to the namespace and control group.
Controlling the job process to call the preset file library comprises controlling the job process in the name space and the control group to call the preset file library.
The Namespace is an isolation mechanism, and is used for dividing system resources (such as process IDs, network interfaces, file system mount points and the like) into independent logic areas which are not interfered with each other. Each namespace maintains its own view of resources so that resources in different namespaces can have the same identifier (e.g., the same process ID or network IP address) without conflicting. Cgroup is a function in the kernel that allows the operating system to manage system resources by controlling and limiting the resource usage of the process group.
The method comprises the steps of creating a daemon, creating a name space and a control group for representing limiting the use resources of a job process by the daemon, receiving the job process to be started, which is sent by a scheduling system, through the daemon, controlling the job process to be started, adding the job process into the name space and the control group, and controlling the job process in the name space and the control group to call a preset file library. Because the namespaces can provide an isolation mechanism, a control group used for representing the limiting of the resources used by the operation process can limit the resources used by the process, not only the resources are limited, but also different resources are cooperatively isolated, and a complete operation isolation system is formed. And the stability and the safety of the multi-resource environment are enhanced through the unified management of daemons.
The number of the job processes is not limited, and is determined according to actual conditions. The job process calls a preset file library. The job process recorded in the preset file library corresponds to a context used for representing the use resource. Specifically, if the preset file library is a compute unified device architecture (Compute Unified Device Architecture, CUDA) library. CUDA is a general parallel computing architecture that enables GPUs to solve complex computing problems. It contains the CUDA Instruction Set Architecture (ISA) and the parallel computing engine inside the GPU. The context recorded by the file library and used for representing the use resource is preset, such as GPU context. A GPU context is an abstract concept for managing and isolating GPU resources from the execution environment. The method provides an independent execution space for each program running on the GPU, and ensures that different programs do not interfere with each other. The context stores the running state of the GPU, including registers, memory maps, configuration parameters, and the like.
In the method for controlling the job Process to use the resources by using the MPS (short for Multi-Process Service), the MPS allows multiple processes to share the same GPU, and provides certain resource isolation and scheduling functions by uniformly managing Kernel submission and memory allocation in a server-client mode. In the client mode, all the job processes are put in the same context, and when the context of one process goes wrong, the context of other processes also goes wrong, so that the preset file library (such as the CUDA library) recorded with different job processes corresponding to different contexts is used in the invention. The context of each job process is independent, and not all job processes use the same context, i.e. isolation of the job processes is achieved.
In order to allocate integer or non-integer resources, in an implementation, controlling a job process to call a preset library includes:
under the condition that the number of resources allocated by the operation process is detected to be an integer, the catalog of the operation process is put into an original preset file library to control the operation process to call the preset file library, wherein the original preset file library allows the operation process to directly access and manage the resources;
under the condition that the number of resources allocated by the operation process is detected to be non-integer, modifying files in an original preset file library to obtain a new preset file library, and placing a catalog of the operation process into the new preset file library to control the operation process to call the preset file library, wherein part of operation processes are allowed to directly access and manage the resources in the new preset file library.
In the method, integer and non-integer resource allocation scenes are divided. For the operation of exclusive whole resource, the original preset file library is used to avoid the extra performance loss caused by hijacking, and for the operation of shared resource, the modified preset file library is started to limit the resource. This "hijack on demand" design maximizes performance while guaranteeing isolation. For computationally intensive jobs (e.g., deep learning training), no additional overhead is needed when resources are exclusively used, and efficient isolation can be realized under a shared scene, thereby achieving both performance and fairness.
After the context which is recorded in the preset file library and corresponds to the job process and used for representing the used resource is obtained, a token which is used for representing the used resource is distributed for the job process according to the context which corresponds to the job process. In order to make the number of allocated tokens more appropriate, in implementation, the resource is a hardware device (such as GPU) for parallel computing, and before allocating the token for characterizing the usage resource to the job process according to the context corresponding to the job process, the method further includes:
The computing power of the hardware device for parallel computing is obtained and a first preset number of tokens are set in a token bucket according to the computing power of the hardware device for parallel computing. The first preset number is not limited, and if the computing power of one GPU is 100, 100 tokens (denoted as token) are set in the corresponding token bucket.
The allocation of the token for representing the use resource for the job process according to the context corresponding to the job process comprises the following steps:
if the job process is determined to prohibit submitting the calculation task to the target resource according to the context corresponding to the job process, the number of tokens which are distributed for the job process and are used for representing the used resource is 0;
If the fact that the calculation task is allowed to be submitted to the target resource is determined according to the context corresponding to the job process, a second preset number of tokens used for representing the used resource is distributed to the job process according to the context corresponding to the job process, wherein the first preset number is larger than the second preset number.
In the method, the first preset number is larger than the second preset number, so that the overload on hardware equipment for parallel computing is avoided, and the reliability of the system is improved.
After assigning tokens to job processes that characterize the use of resources, the number of tokens may be further updated according to the resource usage to prevent some job processes from ever occupying resources. First, it is necessary to acquire resource usage. In an implementation, obtaining the resource usage includes:
Creating a monitoring process for representing the resource utilization rate of monitoring each job process;
acquiring a resource utilization rate history curve of the operation process through the monitoring process;
And determining average resource utilization rate according to the resource utilization rate history curve of the job process to serve as the resource utilization rate.
If a monitoring process is started on the computing node, the process uses NVML libraries to monitor the GPU usage of each GPU job process. Each compute node running GPU jobs needs to run a monitor process. The monitoring process is a dead cycle, and the GPU utilization rate is continuously detected and recorded, so that a GPU utilization rate history curve of the process is obtained, and further, the average GPU utilization rate can be calculated.
In an implementation, updating the number of tokens corresponding to a job process according to resource usage includes:
Determining the current token number corresponding to the current resource utilization rate according to the preset relation between the utilization rate of the hardware equipment for parallel computation and the token number;
And eliminating the tokens of the current token number from the tokens of the first preset number to update the number of tokens corresponding to the operation process, wherein the updated number of tokens is a difference value obtained by subtracting the current token number from the first preset number.
If the average GPU usage rate of the job process is 20%, the number of tokens used for calculating the job according to the relationship between the GPU usage rate and the tokens is 100×20% =20, because the number of tokens initially allocated to the job process is 50, after 20 is used, the number of remaining tokens is 30, and the number of tokens remaining available for the process in the token bucket is updated to 30.
After obtaining the updated token number, controlling the job process to use the resource according to the updated token number includes:
Allowing the job process to submit the calculation task to use the resource under the condition that the updated token number is detected to be larger than a first preset value;
And under the condition that the number of the updated tokens is equal to a first preset value, prohibiting the job process from submitting the calculation task so as to prohibit the use of resources.
The first preset value is not limited, and is 0. Namely, allowing the job process to submit the calculation task to use the resource when the updated token number is detected to be greater than 0, and prohibiting the job process from submitting the calculation task to prohibit the use of the resource when the updated token number is detected to be equal to 0.
Because the scheme of limiting the use of resources by the job process through the token bucket has certain hysteresis, the situation that the short-time job has higher resource use rate and the token of the token bucket is negative may occur. To avoid this, in practice, the method of controlling the progress of the job further comprises:
predicting a use trend of the hardware equipment for parallel calculation by a time sequence prediction algorithm under the condition that the number of the updated tokens is detected to be a second preset value, wherein the second preset value is larger than the first preset value, and the difference value between the second preset value and the first preset value is smaller than the preset difference value;
And if the use trend is detected to be an ascending trend, prohibiting the job process from submitting the calculation task so as to prohibit the use of the resource.
The second preset value and the preset difference are not limited, and if the first preset value is 0, the second preset value is a value close to 0.
According to the method, a time sequence prediction algorithm is adopted to predict the use trend of the resources, and when the number of the tokens is a second preset value, the job process is forbidden to submit the calculation task, so that the advanced blocking of the process use resources is realized.
The computational tasks may be any parallel computational tasks such as matrix multiplication, convolution operations, vector addition, and the like. The computational task is also known as the Kernel function. Kernel functions are one of the core concepts of CUDA programming, a function that runs exclusively on a GPU. It is written by a programmer using the CUDA syntax to accelerate computationally intensive tasks by executing a large number of threads in parallel. Unlike normal C/C++ functions, kernel functions need to be called by host (CPU) code and executed on a device (GPU) using a global key declaration. The design purpose is to fully utilize the parallel computing capability of the GPU and realize efficient data processing. In order to be able to subsequently proceed to the blocked computing task, in an implementation, the method of controlling the job process further comprises:
pre-creating a computing task for storing the congestion;
And after determining that the job process prohibits submitting the calculation task to the target resource according to the context corresponding to the job process, or after prohibiting the job process from submitting the calculation task, placing the calculation task prohibited from being submitted into a buffer instruction queue.
Furthermore, to avoid buffer instruction queue overflow, in some embodiments, the method of controlling a job process further includes:
acquiring the situation of a calculation task in a buffer instruction queue;
If the computing task is detected to occupy the whole buffer instruction queue, the operation of the operation process in the central processing unit is suspended from the operating system.
In the method, the process is suspended, so that the fact that a new instruction is not submitted to a resource for execution is guaranteed, and the buffer instruction queue is guaranteed not to overflow.
In order to avoid that the resources used by some job processes are relatively high, if the hardware devices for parallel computing are still executing, the utilization rate of the resources is further increased, and in order to ensure that other job processes can use the resources, in some embodiments, the method for controlling the job processes further includes:
if the target operation process with the resource utilization rate larger than the preset utilization rate is detected, the context of the target operation process is forced to be cut out, and execution of the context of the target operation process is suspended;
The control of the context of other job processes is cut into the execution of the hardware device for parallel computing, wherein the other job processes are the job processes except the target job process in all the job processes.
The preset utilization rate and other selected operation processes are not limited, and the operation processes are determined according to actual conditions.
In implementations, controlling the context of other job processes to be cut into the execution of hardware devices for parallel computing includes:
Acquiring the priority order of all other job processes;
and selecting the job process with the highest priority, and controlling the context of the job process with the highest priority to cut into the execution of the hardware equipment for parallel computation.
In practice other scheduling algorithms may be used to select the job process used. By using the method, the further increase of the utilization rate of the resources by the target operation process is avoided, and on the other hand, the other operation processes are ensured to use the resources, namely, the dynamic and flexible adjustment of the resource allocation is realized. In addition, according to the task process with the highest priority, the context of the task process with the highest priority is selected and controlled to be cut into the execution of the hardware equipment for parallel computation. It is ensured that the job process with high priority can be executed in time.
In practice, a load fluctuation scenario may occur, so as to meet the requirements of the job process, and in some embodiments, the job process control method further includes:
after the updated token quantity is equal to 0, if the resource utilization rate of the job process which is forbidden to submit the calculation task is detected to be reduced, the token is issued to the job process which is forbidden to submit the calculation task again;
in the event that a number of tokens greater than 0 is detected, the execution of the context of the job process that prohibited the submission of the computing task is resumed.
Specifically, resuming execution of the context of the job process that prohibited the submission of the computing task includes:
If the context of the job process which is forbidden to submit the calculation task is detected to be in a pause state, the execution of the context of the job process which is forbidden to submit the calculation task on the hardware equipment for parallel calculation is resumed;
Or if the residual space exists in the instruction queue, migrating the calculation task in the buffer instruction queue to the instruction queue, wherein the instruction queue is used for storing the calculation task which is allowed to be submitted;
If the job process migrated to the instruction queue is detected to be in a state of suspending the operation of the job process in the central processing unit in the operating system, the operation of the job process in the central processing unit is resumed from the operating system.
The system provided by the method supplements the tokens in time and eliminates the tokens in the previous way, namely, the condition of using the resources by the process is adjusted, the dynamic property adapts to common load fluctuation (such as iteration change of AI model training) in high-performance calculation, the resource overuse is prevented, the operation execution can be quickly recovered after the resource release, and the response capability of the system is improved.
In order to ensure the integrity of resource management, in implementation, the method for controlling the job process further comprises:
releasing resources occupied by the operation process under the condition of detecting the end of the operation process;
The control daemon releases the namespace and a control group that characterizes the resources that limit the use of the resource by the job process.
The control method of the operation process provided by the invention comprises the following steps:
1) Flexibility in non-integer resource allocation is supported.
Taking GPU resources as an example, the invention supports the allocation mode of users to designate non-integer GPU resources (such as 0.5 or 2.5 GPUs), and breaks through the limitation of the allocation of the traditional whole GPU. The scheduling system marks partial use states of the GPU and dynamically distributes the partial use states, so that the resource utilization rate is remarkably improved, and the method is suitable for diversified load demands.
The present invention provides finer granularity and flexibility of resource partitioning than fixed instance partitioning of MIGs or static quota of MPS.
2) The token bucket algorithm is combined with a quantification of resource usage.
And introducing a token bucket algorithm, taking the GPU utilization rate as a quantization index of computing resource consumption, and dynamically limiting the GPU utilization of the job through allocation, deduction and supplementation of tokens.
The proportional allocation and the real-time adjustment of the computing resources are realized, and the fairness and the resource utilization efficiency are ensured.
Unlike the coarse-grained restriction of the traditional Cgroup, the invention realizes the fine control of the GPU computing capability.
3) And (3) comprehensive design of a hysteresis compensation mechanism.
For the hysteresis of token bucket control (e.g., short time overuse results in negative token), a combined compensation mechanism of timing prediction, buffered instruction queue, and GPU context switching is designed.
The accuracy and smoothness of resource control are obviously improved by predicting advanced blocking, buffering, avoiding suspension and switching forced intervention.
4) The dual mode strategy of hijacking and performance optimization on demand.
According to the number of GPUs (integer or non-integer) allocated by the operation, the original CUDA library or the modified CUDA library is selectively used, so that unnecessary hijacking overhead is avoided.
The optimal performance is kept in the exclusive GPU scene, strict isolation is realized in the shared scene, and the performance and the isolation requirement are both considered.
Compared with the unified processing mode of MPS or MIG, the dual-mode design of the invention is more intelligent and more efficient.
5) And (5) systematically realizing multi-resource cooperative isolation.
By combining Linux Cgroup and a namespace, GPU computing resources are limited, and resources such as CPU, memory and the like are cooperatively isolated to form a complete operation isolation system.
Through the unified management of daemon processes, the stability and the safety of the multi-resource environment are enhanced.
In the prior art, a single resource (such as a GPU or a CPU) is focused, and the invention realizes the systematic management of multidimensional resources.
In order that those skilled in the art will better understand the above method, the following description of the overall process described above will proceed with reference being made to the drawings and specific embodiments. Fig. 2 is a schematic diagram of a resource used by a job process according to an embodiment of the present invention. In fig. 2, the process of using the graphics processing unit by the process is described by taking the resource as the graphics processing unit and taking the two processes as the first process and the second process, respectively. Process one is located in namespace one and process two is located in namespace two. Firstly, a unified device architecture library (CUDA library) is called, and then, after the device architecture library is driven by a driver of the graphic processing unit, the hardware of the graphic processing unit is used for processing. In the method provided by the invention, the refined distribution and dynamic management of GPU computing resources are realized by combining a token bucket mechanism and dynamic monitoring through the CUDA call hijacking of a software layer. For non-integer GPU allocation scenarios, kernel commit is hijacked to limit computing power, and for exclusive GPU scenarios, a native CUDA library is used to avoid performance loss. The scheme takes the GPU utilization rate as a resource consumption index, the distribution proportion is quantized through a token bucket, and a prediction and compensation mechanism is introduced to solve control hysteresis, so that isolation, fairness and high utilization rate are ensured when the GPU is shared by multiple jobs.
FIG. 3 is an overall schematic diagram of a CUDA hijacking-based GPU sharing and limiting scheme in a high-performance computing scenario, as shown in FIG. 3, in which a user job submission (user submits a graphic processing unit job, specifies a graphic processing unit model number and number), a scheduling system allocates a computing node, marks the resource status of the graphic processing unit, receives a scheduling request through a computing node daemon in a computing node layer, creates a namespace and a control group to start a job process, runs the user job, loads a computing unified device architecture library (original or modified version), calculates the submission of a unified device architecture library call interception, hijacking a computing task function (Kernel function) in the computing unified device architecture library hijacking and control layer, dynamically controls the graphics processing unit resource, and simultaneously quantifies the graphic processing unit resource in combination with a token bucket. In addition, in the monitoring and optimizing layer, the monitoring process uses NVML library to monitor the utilization rate of the graphic processing unit, the hysteresis compensation layer performs hysteresis compensation through time sequence prediction, buffer instruction queue and context switching, and finally, the computing task function is executed in the hardware of the graphic processing unit to output the computing result.
The sharing and limiting scheme based on CUDA hijacking in the high-performance computing scene aims at realizing flexible allocation and strict isolation of GPU resources. The user may submit a job and specify that an integer or non-integer number of GPUs are needed, and the scheduling system allocates GPUs and marks the use cases according to the resource status. To limit the resource usage of the operation to the GPU, the CPU and the memory, the scheme utilizes the Linux Cgroup and the naming space to isolate the process, utilizes the modified CUDA library to hijack Kernel function call, and combines the token bucket algorithm to dynamically control the GPU computing resource. The monitoring process detects the GPU utilization rate in real time, adjusts the number of tokens, blocks or pauses the execution of the job when the token is exhausted, optimizes the resource contention through the buffer instruction queue and the context switching, and realizes efficient sharing and isolation.
In implementation, the implementation process of the GPU sharing and limiting scheme based on CUDA hijacking in the high-performance computing scene is as follows:
1. The user submits the GPU job, and the model and the number of GPUs need to be specified when submitting, and the number can be an integer number or a non-integer number, for example, 0.5 GPUs with certain models can be specified, and 2.5 GPUs with certain models can be specified.
2. And the scheduling system performs scheduling and finds out the computing nodes meeting the conditions. After each scheduled job allocates GPU resources, the scheduling system marks the allocated GPUs as (all or part of) being used, where all uses refer to allocating the entire GPU to a job, and part of which refers to allocating the GPU's part to a job (e.g., 0.5 GPUs to a job). The fully used GPU may no longer be used by other jobs and some of the used GPUs may be allocated for use by other jobs if sufficient resources remain.
3. To more severely limit the use of resources by job processes (not only GPU resources but also CPU, memory, etc.), the group of cgroups provided by the linux system is used. The specific flow is as follows:
1) One daemon runs on each computing node, known as a computing node daemon. When the scheduling system schedules the operation of the job, a request is sent to the daemon of the computing node, and the information of the job process to be started is sent to the daemon of the computing node, so that the job process is started.
2) The computing node daemon receives the request, creates a command space and a Cgroup group at the computing node, and sets the limitation of the Cgroup group on resources such as CPU, memory and the like.
3) The user job process is started and added to the previously created namespace and Cgroup group.
4) The root directory of the operation process is switched to the appointed directory, and the directory contains the modified CUDA library, so that the modified CUDA library is used when the operation process makes the CUDA call, and the call of the user operation process to the CUDA library can be intercepted. Here, it is distinguished whether the number of GPUs allocated is an integer number or a non-integer number. In some cases, the user's demand for GPU computing resources is relatively high, requiring allocation of the entire GPU. Under the condition, because the operation process solely shares the GPU, the operation process does not need to share the resources of a certain GPU with other processes, and therefore, the isolation and limitation problem of GPU resources is not needed to be considered, CUDA hijacking is not needed, the original CUDA library is put in the catalog of the process instead of the modified CUDA library, and the performance loss caused by CUDA hijacking can be avoided by using the original CUDA library. The modified CUDA library need only be used when the number of GPUs allocated by the job process is not an integer.
The GPU job uses the GPU in such a way that one and the other computing tasks, which may be any parallel computing tasks such as matrix multiplication, convolution operation, vector addition, etc., are submitted to the GPU, and the process of submitting the computing tasks is called Kernel launch.
5. A process (hereinafter referred to as a monitor process) is started on the compute node, which monitors GPU usage of each GPU job process using the NVML library of NVIDIA. Each compute node running GPU jobs needs to run a monitor process. The monitoring process is a dead cycle, and the GPU utilization rate is continuously detected and recorded, so that a GPU utilization rate history curve of the process is obtained, and further, the average GPU utilization rate can be calculated.
6. The token bucket controls the scheme of the job process GPU computing resource usage.
The monitored usage of the GPU represents the usage of computing resources in the GPU, i.e., the higher the usage of the GPU, i.e., the more computing resources in the GPU are used. Such a scheme is designed to control the use of the job process GPU by the token bucket, assuming that the computational power of one GPU is 100 (set to 100 for simplicity of computation only), corresponding to 100 tokens in the token bucket. Initially, a certain number of tokens are allocated to a GPU job process of a user, for example, 50 tokens are allocated to the job process, which represents 50% of computing resources that can use the GPU, in the process that the job process uses the GPU, the GPU usage rate will be increased, after detecting that the GPU usage rate is increased, the corresponding token is deducted from the token, and when the token is reduced to 0, the job process is not allowed to continue submitting Kernel functions to GPU hardware for execution.
A token calculation method. The following description will be given by way of a specific example.
Assuming that the computational power of one GPU corresponds to 100 tokens in the token bucket, the number of tokens allocated for a certain GPU job process is 50.
1) And a time point 1, namely the initial stage of starting the operation process.
The average GPU usage of the job process is 20%, the number of tokens used for calculating the job according to the relationship between the GPU usage and the tokens is 100×20% =20, because the number of tokens initially allocated to the job process is 50, after 20 is used, the number of remaining tokens is 30, and the number of tokens remaining available for the process in the token bucket is updated to 30.
If the job process continues to commit the Kernel function at this point, commit is allowed because the number of tokens remaining available in the token bucket is greater than 0 at this point, i.e., there are more tokens available for the job process.
2) Time point 2, load increases.
The average GPU usage of this job process reaches 50%, the number of tokens used for the job is calculated to be 100×50% =50 according to the relationship between GPU usage and token, because the number of tokens initially allocated to the job process is 50, after 50 is used, the number of remaining tokens is 0, and the number of tokens remaining available for the process in the token bucket is updated to 0. If the job process continues to commit the Kernel function at this point, the Kernel function is re-committed when there are tokens available in the token bucket, because the number of tokens remaining available in the token bucket at this point is equal to 0, i.e., there are no tokens available for the job process, thus blocking the commit of the Kernel function.
8. Because it is difficult to accurately estimate the GPU computing power consumed by each Kernel function submitted by the job process, the scheme of limiting the job process to use the GPU by the token bucket has a certain hysteresis, and a situation may occur in which the GPU usage rate is high when the job is short, resulting in a negative token of the token bucket. For this problem, compensation is performed by:
1) In the first method, when the token in the token bucket is smaller than or equal to 0, the job progress is blocked from continuously submitting the Kernel function, so that the GPU utilization rate is further increased. For more accurate control, a timing prediction algorithm may be used to predict the next usage of the process according to its historical GPU usage, and if token in the token bucket drops to approximately 0 and it is predicted that there is an increasing trend in GPU usage, the Kernel function submission may be blocked. I.e., blocking the Kernel function submission in advance when the token is predicted to be about to run out.
The Kernel function is submitted by submitting an instruction to an instruction queue, the GPU hardware fetches the instruction from the instruction queue for execution, and when the instruction submitted by the Kernel function is hijacked, if the Kernel function is blocked from being submitted at this time, the instruction is not directly put into the instruction queue, and the instruction can be temporarily stored in a block of buffer memory area (called a buffer instruction queue). Because the size of the buffer instruction queue is limited, if a process continues to commit to the Kernel function, which may result in the buffer instruction queue running out of space, the process can only be suspended, i.e., suspended from the operating system from running in the CPU (this state is referred to as the CPU suspended state for distinction from the GPU), and suspended will not resubmit new instructions to the GPU for execution, thereby ensuring that the buffer instruction queue will not overflow.
The purpose of employing a buffered instruction queue is to avoid as much as possible the performance penalty caused by directly suspending the process.
2) In the second method, when the monitoring process detects that the GPU utilization rate of a certain process is too high, GPU context switching is forced to be carried out, the contexts of the process with the too high GPU utilization rate are cut out, the GPU contexts of other processes are cut into the GPU for execution (the GPU contexts of the process with higher priority can be selected or other scheduling algorithms can be adopted for selection), and the GPU context execution of the process is suspended, namely the execution from the context to GPU hardware is not scheduled later until enough token exists in a token bucket and then the execution is resumed. Stopping the GPU context from executing in the GPU hardware in this manner avoids further increases in GPU utilization.
3) The method one and the method two can be used comprehensively, wherein the method one focuses on advancing process control when the utilization rate of the GPU of the process is about to reach the upper limit, and the method two focuses on controlling after the utilization rate of the GPU of the process reaches or exceeds the upper limit.
9. When the GPU resource is used by the process by adopting the method, the GPU utilization rate of the process can be reduced, and the token number in the token bucket can be updated after the monitoring process monitors the GPU utilization rate change. When the number of token is greater than 0, the context execution of the process can be resumed, and the process of resuming is as follows:
1) If the GPU context for the process has been suspended, resuming execution of the GPU context on the GPU hardware.
2) If there is still an instruction to be executed in the buffered instruction queue, execution in the instruction queue is waited for, because the GPU context has resumed executing on the GPU hardware, instructions in the instruction queue will be consumed gradually, when there is sufficient space in the instruction queue, instructions in the buffered instruction queue will be moved into the instruction queue, if the process is still in the CPU suspended state at this time, it will be resumed, allowing it to commit the Kernel function.
10. After the user operation process is finished, related resources are released, and the computing node daemon deletes the previously created name space and Cgroup group.
The GPU isolation scheme based on CUDA hijacking provided by the invention is an innovative high-performance computing resource management method, and particularly shows uniqueness in aspects of GPU resource sharing and fine control. The following beneficial effects may be brought to the technical scheme:
1. And the GPU resource utilization rate is improved.
Effect description by supporting allocation of non-integer numbers of GPUs (e.g., 0.5 or 2.5 GPUs), this approach allows for more flexible partitioning of GPU computing resources, avoiding the problem of resource idling caused by conventional "whole GPU allocation". For example, a job requiring only 50% of computing power need not monopolize the entire GPU, and the remaining resources may be allocated to other jobs.
In a multi-user or multi-task high-performance computing cluster, the GPU utilization rate can be remarkably improved, the resource waste is reduced, and the method is particularly suitable for small-sized operation scenes with lighter loads and more numbers.
2. And the operation isolation and stability are enhanced.
The CPU, the memory and other resources are limited by Cgroup, and the GPU computing resources are finely controlled by combining CUDA hijacking and a token bucket mechanism, so that the scheme ensures that the operation processes are not interfered with each other. Even if a job attempts to overuse GPU resources, it is limited to the allocation range.
The situation that certain operation excessively occupies the GPU to cause other operations to be blocked is avoided, the overall running stability of the cluster is improved, and the method is particularly suitable for a high-performance computing cluster or cloud GPU sharing scene.
3. The flexibility of performance overhead is reduced.
The scheme distinguishes between integer and non-integer GPU allocation scenarios. For the operation of exclusive whole GPU, the original CUDA library is used, so that the extra performance loss caused by hijacking is avoided, and for the operation of sharing GPU, the modified CUDA library is started for resource limitation.
This "hijack on demand" design maximizes performance while guaranteeing isolation. For computationally intensive jobs (e.g., deep learning training), no additional overhead is required when the GPU is monopolized, and efficient isolation can be still realized in a shared scenario, while performance and fairness are both considered.
4. Adaptability of dynamic resource management.
The GPU utilization rate is detected in real time through a monitoring process, and the scheme can dynamically adjust resource allocation by combining a token bucket and a context switching mechanism. When the workload varies (e.g., from 20% to 50% usage), the system may timely deduct or replenish the token and block or pause the job when the token is exhausted.
The dynamic performance adapts to common load fluctuation (such as iteration change of AI model training) in high-performance calculation, so that the resource overuse is prevented, the operation execution can be quickly restored after the resource release, and the response capability of the system is improved.
5. Reducing the hysteresis impact of resource contention.
Aiming at the problem of hysteresis of a token bucket (the short-term overrun of the GPU utilization rate leads to negative token), the scheme introduces compensation mechanisms such as time sequence prediction, buffer instruction queues, context switching and the like. The prediction is blocked in advance, the buffer queue is prevented from being directly hung, and the context switch forcedly intervenes in the overuse behavior.
The measures reduce system jitter or operation interruption caused by resource overuse, and can more smoothly process burst load especially in a high concurrency scene, thereby ensuring service quality.
6. Efficient execution of diverse computing tasks is supported.
Whether the Kernel functions such as matrix multiplication, convolution operation, vector addition and the like are adopted, the scheme can ensure that the task runs efficiently in the range of allocated resources by hijacking CUDA call and token bucket control.
The method is suitable for multiple GPU intensive application scenes such as deep learning, scientific calculation, image processing and the like, and the universality and practicality of the scheme are enhanced.
7. Simplifying cluster management and maintenance.
The computing node daemon uniformly manages the operation starting, the resource limiting and the releasing, and provides a utilization rate history curve and an average value by matching with the monitoring process, so that an administrator can analyze the utilization condition of resources and optimize a scheduling strategy.
The complexity of cluster operation and maintenance is reduced, and an administrator can adjust token allocation or scheduling rules based on monitoring data to further optimize resource allocation efficiency.
8. And the user experience and fairness are improved.
The user can flexibly specify the GPU resource amount (integer or non-integer) according to the actual demand, and the token bucket and the isolation mechanism ensure the fairness of resource use, and all jobs compete for token fairly, so that the situation that the forceful jobs occupy the forceful jobs is avoided.
Users do not need to worry about the preemption of resources, requirements can be matched more accurately when the jobs are submitted, the use experience is improved, and the method has competitiveness in commercial clusters sharing the GPU. And is beneficial to accurate metering and charging.
In the above embodiments, the detailed description is given to the control method of the job process, and the invention also provides the corresponding embodiments of the control device and the server of the job process. It should be noted that the present invention describes an embodiment of the device portion from two angles, one based on the angle of the functional module and the other based on the angle of the hardware.
The embodiment of the invention provides a control device for a job process, which comprises the following components based on the angle of a functional module:
The first control module is used for controlling the operation process to call a preset file library and obtaining the context which is recorded in the preset file library and corresponds to the operation process and used for representing the used resource, wherein in the preset file library, different operation processes correspond to different contexts;
The allocation module is used for allocating tokens for representing the used resources for the job process according to the context corresponding to the job process;
the acquisition and updating module is used for acquiring the resource utilization rate and updating the number of tokens corresponding to the operation process according to the resource utilization rate;
And the second control module is used for controlling the job process to use the resources according to the updated token quantity.
In some embodiments, the control device for a job process further includes:
the system comprises a receiving module, a scheduling system and a scheduling module, wherein the receiving module is used for receiving the job submitted by a user through the scheduling system, and the job submitted by the user comprises the type of the resource to be used and the quantity of the resource;
the allocation and marking module is used for selecting the computing nodes meeting the preset conditions according to the jobs submitted by the users by the scheduling system, and marking the use condition of the resources after the resources on the computing nodes are allocated to the jobs, wherein the preset conditions are that the number of the resources on the computing nodes is greater than or equal to the number of the resources in the jobs submitted by the users, and the use condition of the resources comprises total use or partial use.
In some embodiments, the control device for a job process further includes:
A first creation module for creating a daemon and creating a namespace by the daemon and for characterizing a control group that restricts the use of resources by the job process;
the receiving and controlling module is used for receiving the operation process to be started, which is sent by the scheduling system, through the daemon process and controlling the operation process to be started;
and the adding module is used for adding the operation process into the namespaces and the control groups.
In some embodiments, the first control module comprises:
The first putting and controlling module is used for putting the catalog of the operation process into an original preset file library to control the operation process to call the preset file library under the condition that the number of the resources allocated by the operation process is detected to be an integer number;
The second placing and controlling module is used for modifying the files in the original preset file library to obtain a new preset file library under the condition that the number of the resources distributed by the operation process is detected to be non-integer, placing the catalogue of the operation process into the new preset file library to control the operation process to call the preset file library, wherein the new preset file library allows part of the operation processes to directly access and manage the resources.
In some embodiments, the acquiring submodule in the acquiring and updating module is configured to acquire the resource usage rate.
The acquisition submodule comprises:
the second creation module is used for creating a monitoring process for representing the resource utilization rate of each operation process;
The first acquisition module is used for acquiring a resource utilization rate history curve of the operation process through the monitoring process;
and the first determining module is used for determining the average resource utilization rate according to the resource utilization rate history curve of the job process to serve as the resource utilization rate.
In some embodiments, the control device for a job process further includes:
A second obtaining module, configured to obtain a computing capability of a hardware device for parallel computing and set a first preset number of tokens in a token bucket according to the computing capability of the hardware device for parallel computing;
in some embodiments, the allocation module comprises:
The first allocation submodule is used for determining that the job process prohibits submitting the calculation task to the target resource according to the context corresponding to the job process, and the number of tokens used for representing the used resource and allocated to the job process is 0;
And the second allocation sub-module is used for allocating a second preset number of tokens used for representing the used resources for the job process according to the context corresponding to the job process if the permission to submit the calculation task to the target resource is determined according to the context corresponding to the job process, wherein the first preset number is larger than the second preset number.
In some embodiments, the update module in the obtaining and updating module is configured to update the number of tokens corresponding to the job process according to the resource usage rate.
The updating module specifically comprises:
The second determining module is used for determining the current token number corresponding to the current resource utilization rate according to the preset relation between the utilization rate of the hardware equipment for parallel computation and the token number;
The eliminating module is used for eliminating the tokens of the current token number from the tokens of the first preset number to update the number of tokens corresponding to the operation process, wherein the updated number of tokens is a difference value obtained by subtracting the current token number from the first preset number.
In some embodiments, the second control module includes:
the detection and permission module is used for permitting the job process to submit the calculation task so as to use the resource under the condition that the updated token quantity is detected to be larger than a first preset value;
and the detection and prohibition module is used for prohibiting the job process from submitting the calculation task to prohibit the use of the resource under the condition that the updated token quantity is detected to be equal to the first preset value.
In some embodiments, the control device for a job process further includes:
The device comprises a prediction module, a time sequence prediction algorithm, a first time sequence prediction module and a second time sequence prediction module, wherein the prediction module is used for predicting the use trend of hardware equipment for parallel calculation through the time sequence prediction algorithm under the condition that the number of updated tokens is detected to be a second preset value, the second preset value is larger than the first preset value, and the difference value between the second preset value and the first preset value is smaller than the preset difference value;
And the prohibiting module is used for prohibiting the job process from submitting the calculation task to prohibit the use of the resource if the use trend is detected to be the ascending trend.
In some embodiments, the control device for a job process further includes:
the third creation module is used for creating a calculation task for storing the blockage in advance;
And the placement module is used for placing the calculation task which is forbidden to be submitted into the buffer instruction queue after determining that the job process is forbidden to submit the calculation task to the target resource according to the context corresponding to the job process or after prohibiting the job process from submitting the calculation task.
In some embodiments, the control device for a job process further includes:
The third acquisition module is used for acquiring the situation of the calculation task in the buffer instruction queue;
and the suspension module is used for suspending the operation of the job process in the central processing unit from the operating system if the calculation task occupies the whole buffer instruction queue.
In some embodiments, the control device for a job process further includes:
the cutting-out and suspending module is used for forcing the context of the target job process to be cut out and suspending the execution of the context of the target job process if the target job process with the resource utilization rate larger than the preset utilization rate is detected;
And the cut-in module is used for controlling the context of other job processes to cut into the execution of the hardware equipment for parallel computation, wherein the other job processes are the job processes except the target job process in all the job processes.
In some embodiments, the lancing module includes:
a fourth obtaining module, configured to obtain priority orders of all other job processes;
and the switching-in sub-module is used for selecting the job process with the highest priority and controlling the context of the job process with the highest priority to switch into the execution of the hardware equipment for parallel computation.
In some embodiments, the control device for a job process further includes:
The issuing module is used for issuing the tokens to the job process which is forbidden to submit the calculation task again if the resource utilization rate of the job process which is forbidden to submit the calculation task is detected to be reduced after the updated token number is detected to be equal to 0;
And the recovery module is used for recovering the execution of the context of the job process which is forbidden to submit the calculation task under the condition that the number of tokens is detected to be larger than 0.
In some embodiments, the recovery module includes:
The first restoring submodule is used for restoring the execution of the context of the job process which is forbidden to submit the calculation task on the hardware equipment for parallel calculation if the context of the job process which is forbidden to submit the calculation task is detected to be in a pause state;
Or the migration module is used for migrating the calculation task in the buffer instruction queue to the instruction queue if the residual space exists in the instruction queue, wherein the instruction queue is used for storing the calculation task which is allowed to be submitted;
And the second recovery submodule is used for recovering the operation of the operation process in the central processing unit from the operating system if the operation process migrated to the instruction queue is detected to be in a state of suspending the operation process in the central processing unit in the operating system.
In some embodiments, the control device for a job process further includes:
the first release module is used for releasing resources occupied by the operation process under the condition of detecting the execution end of the operation process;
And the second release module is used for controlling the daemon to release the namespaces and characterizing a control group for limiting the use of resources by the job process.
Since the embodiments of the apparatus portion and the embodiments of the method portion correspond to each other, the embodiments of the apparatus portion are referred to the description of the embodiments of the method portion, and are not repeated herein.
Fig. 4 is a block diagram of a server according to another embodiment of the present invention. The present embodiment is based on hardware angle, as shown in fig. 4, and the server includes:
a memory 20 for storing a computer program;
a processor 21 for implementing the steps of the control method of the job process as mentioned in the above embodiments when executing the computer program.
Processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The Processor 21 may be implemented in at least one hardware form of a digital signal Processor (DIGITAL SIGNAL Processor, DSP), a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), or a Programmable logic array. The processor 21 may also comprise a main processor, which is a processor for processing data in a wake-up state, also called CPU, and a co-processor, which is a low power processor for processing data in a standby state. In some embodiments, the processor 21 may be integrated with a GPU for taking care of rendering and drawing of the content that the display screen is required to display. In some embodiments, the processor 21 may also include an AI processor for processing computing operations related to machine learning.
Memory 20 may include one or more computer-readable storage media, which may be non-transitory. Memory 20 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 20 is at least used for storing a computer program 201, which, when loaded and executed by the processor 21, is capable of implementing the relevant steps of the job process control method disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 20 may further include an operating system 202, data 203, and the like, where the storage manner may be transient storage or permanent storage. Operating system 202 may include Windows, unix, linux, among other things. The data 203 may include, but is not limited to, data related to the above-mentioned control method of the job process, and the like.
In some embodiments, the server may further include a display 22, an input-output interface 23, a communication interface 24, a power supply 25, and a communication bus 26.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is not limiting and may include more or fewer components than illustrated.
The server provided by the embodiment of the invention comprises a memory and a processor, wherein the processor can realize the method of controlling the operation process when executing the program stored in the memory.
The embodiment of the invention also provides a computer program product, which comprises a computer program/instruction, wherein the computer program/instruction realizes the steps of the control method of the job process when being executed by a processor.
Finally, the invention also provides a corresponding embodiment of the computer readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps as described in the method embodiments above.
It will be appreciated that the methods of the above embodiments, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored on a computer readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium for performing all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The computer readable storage medium provided by the invention comprises the control method of the operation process, and the effects are the same as the above.
The method, the device, the product, the server and the medium for controlling the operation process provided by the invention are described in detail. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it will be apparent to those skilled in the art that the present invention may be modified and practiced without departing from the spirit of the present invention.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

Claims (19)

1. A method of controlling a job process, applied to a computing node, the method comprising:
The method comprises the steps of controlling a job process to call a preset file library, and obtaining contexts which correspond to the job processes recorded in the preset file library and are used for representing used resources, wherein in the preset file library, different job processes correspond to different contexts;
Distributing tokens for representing the used resources for the job process according to the context corresponding to the job process;
acquiring the resource utilization rate and updating the number of tokens corresponding to the job process according to the resource utilization rate;
Controlling the operation process to use resources according to the updated token quantity;
The controlling the operation process to call the preset file library comprises the following steps:
under the condition that the number of resources allocated by the operation process is detected to be an integer, the catalog of the operation process is put into an original preset file library to control the operation process to call the preset file library, wherein the original preset file library allows the operation process to directly access and manage the resources;
Under the condition that the number of resources allocated by the operation process is detected to be non-integer, modifying the files in the original preset file library to obtain a new preset file library, and placing the catalog of the operation process into the new preset file library to control the operation process to call the preset file library, wherein the new preset file library allows part of operation processes to directly access and manage the resources.
2. The method for controlling a job process according to claim 1, wherein before the controlling the job process calls a preset file library and obtains a context for characterizing a use resource corresponding to the job process recorded in the preset file library, the method further comprises:
Receiving a job submitted by a user through a scheduling system, wherein the job submitted by the user comprises a resource model and a resource quantity to be used, and the resource quantity is an integer or a non-integer;
the scheduling system, after receiving the job submitted by the user, further comprises:
Selecting computing nodes meeting preset conditions according to the jobs submitted by the users, and marking the use condition of the resources after the resources on the computing nodes are allocated to the jobs, wherein the preset conditions are that the number of the resources on the computing nodes is greater than or equal to the number of the resources in the jobs submitted by the users, and the use condition of the resources comprises total use or partial use.
3. The method for controlling a job process according to claim 2, wherein after receiving a job submitted by a user through a scheduling system, the controlling the job process calls a preset file library, and before acquiring a context for characterizing a use resource corresponding to the job process recorded in the preset file library, further comprises:
creating a daemon and creating a namespace and a control group for characterizing resources used by the restricted job process by the daemon;
Receiving a to-be-started operation process sent by the scheduling system through a daemon process, and controlling the to-be-started operation process to be started;
Adding a job process to the namespace and the control group;
The controlling the operation process to call the preset file library comprises the following steps:
And controlling the job processes in the namespaces and the control groups to call a preset file library.
4. The method of claim 1, wherein the obtaining the resource usage rate comprises:
Creating a monitoring process for representing the resource utilization rate of monitoring each job process;
Acquiring a resource utilization rate history curve of the operation process through the monitoring process;
And determining average resource utilization rate according to the resource utilization rate history curve of the job process to serve as the resource utilization rate.
5. A method of controlling a job process according to claim 3, wherein the resource is a hardware device for parallel computing, and further comprising, before the assigning a token for characterizing use of the resource to the job process according to a context corresponding to the job process:
Acquiring the computing capacity of a hardware device for parallel computing and setting a first preset number of tokens in a token bucket according to the computing capacity of the hardware device for parallel computing;
the allocating the token for representing the use resource for the job process according to the context corresponding to the job process comprises the following steps:
if the job process is determined to prohibit submitting the calculation task to the target resource according to the context corresponding to the job process, the number of tokens which are distributed for the job process and are used for representing the used resource is 0;
If the fact that the calculation task is allowed to be submitted to the target resource is determined according to the context corresponding to the job process, a second preset number of tokens used for representing the used resource is distributed to the job process according to the context corresponding to the job process, wherein the first preset number is larger than the second preset number.
6. The method of claim 5, wherein updating the number of tokens corresponding to the job process according to the resource usage rate comprises:
Determining the current token number corresponding to the current resource utilization rate according to the preset relation between the utilization rate of the hardware equipment for parallel computation and the token number;
And eliminating the tokens of the current token number from the tokens of the first preset number to update the number of tokens corresponding to the operation process, wherein the updated number of tokens is a difference value obtained by subtracting the current token number from the first preset number.
7. The method of controlling a job process according to claim 6, wherein controlling the job process to use the resource according to the updated token number comprises:
Allowing the job process to submit the calculation task to use the resource under the condition that the updated token number is detected to be larger than a first preset value;
And under the condition that the number of the updated tokens is equal to the first preset value, prohibiting the job process from submitting the calculation task so as to prohibit the use of resources.
8. The method for controlling a job process according to claim 7, characterized in that the method further comprises:
Predicting the use trend of the hardware equipment for parallel computation through a time sequence prediction algorithm under the condition that the number of the updated tokens is detected to be a second preset value, wherein the second preset value is larger than the first preset value, and the difference value between the second preset value and the first preset value is smaller than the preset difference value;
And if the use trend is detected to be an ascending trend, prohibiting the job process from submitting the calculation task so as to prohibit the use of the resource.
9. The method of controlling a job process according to claim 8, further comprising:
pre-creating a computing task for storing the congestion;
And after determining that the job process prohibits submitting the calculation task to the target resource according to the context corresponding to the job process, or after prohibiting the job process from submitting the calculation task, placing the calculation task prohibited from being submitted into a buffer instruction queue.
10. The method of controlling a job process according to claim 9, further comprising:
Acquiring the situation of a calculation task in the buffer instruction queue;
If the computing task occupies the whole buffer instruction queue, suspending the operation of the job process in the central processing unit from the operating system.
11. The method of controlling a job process according to claim 10, further comprising:
if the target operation process with the resource utilization rate larger than the preset utilization rate is detected, the context of the target operation process is forced to be cut out, and execution of the context of the target operation process is suspended;
and controlling the contexts of other job processes to be cut into the execution of the hardware equipment for parallel computation, wherein the other job processes are the job processes except the target job process in all the job processes.
12. The method according to claim 11, wherein the controlling the contexts of the other job processes to be cut into the execution of the hardware device for parallel computing includes:
Acquiring the priority order of all other job processes;
and selecting the job process with the highest priority, and controlling the context of the job process with the highest priority to cut into the execution of the hardware equipment for parallel computation.
13. The method of controlling a job process according to claim 12, further comprising:
after the updated token quantity is equal to 0, if the resource utilization rate of the job process which is forbidden to submit the calculation task is detected to be reduced, the token is issued to the job process which is forbidden to submit the calculation task again;
in the event that a number of tokens greater than 0 is detected, the execution of the context of the job process that prohibited the submission of the computing task is resumed.
14. The method of claim 13, wherein the resuming the execution of the context of the job process that is prohibited from submitting the computing task comprises:
if the context of the job process which is forbidden to submit the calculation task is detected to be in a pause state, the execution of the context of the job process which is forbidden to submit the calculation task on the hardware equipment for parallel calculation is resumed;
Or if the residual space exists in the instruction queue, migrating the calculation task in the buffer instruction queue to the instruction queue, wherein the instruction queue is used for storing the calculation task which is allowed to be submitted;
and if the job process migrated to the instruction queue is detected to be in a state of suspending the operation of the job process in the central processing unit in the operating system, resuming the operation of the job process in the central processing unit from the operating system.
15. A method of controlling a job process according to claim 3, further comprising:
releasing resources occupied by the operation process under the condition of detecting the end of the operation process;
the daemon is controlled to release the namespace and a control group for characterizing resources that limit the use of the resources by the job process.
16. A control device for a job process, applied to a computing node, the control device comprising:
The first control module is used for controlling the operation process to call a preset file library and obtaining the context which is recorded in the preset file library and corresponds to the operation process and used for representing the used resource, wherein in the preset file library, different operation processes correspond to different contexts;
The allocation module is used for allocating tokens for representing the used resources for the job process according to the context corresponding to the job process;
the acquisition and updating module is used for acquiring the resource utilization rate and updating the number of tokens corresponding to the operation process according to the resource utilization rate;
the second control module is used for controlling the operation process to use resources according to the updated token quantity;
The first control module includes:
The first putting and controlling module is used for putting the catalog of the operation process into an original preset file library to control the operation process to call the preset file library under the condition that the number of the resources allocated by the operation process is detected to be an integer number;
The second placing and controlling module is used for modifying the files in the original preset file library to obtain a new preset file library under the condition that the number of the resources distributed by the operation process is detected to be non-integer, placing the catalogue of the operation process into the new preset file library to control the operation process to call the preset file library, wherein the new preset file library allows part of the operation processes to directly access and manage the resources.
17. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of controlling the progress of a job as claimed in any one of claims 1 to 15.
18. A server for a server, which comprises a server and a server, characterized by comprising the following steps:
A memory for storing a computer program;
A processor for implementing the steps of the method of controlling a job process according to any one of claims 1 to 15 when executing the computer program.
19. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method of controlling a job process according to any one of claims 1 to 15.
CN202510919781.2A 2025-07-04 2025-07-04 A method, device, product, server and medium for controlling an operation process Active CN120429091B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510919781.2A CN120429091B (en) 2025-07-04 2025-07-04 A method, device, product, server and medium for controlling an operation process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510919781.2A CN120429091B (en) 2025-07-04 2025-07-04 A method, device, product, server and medium for controlling an operation process

Publications (2)

Publication Number Publication Date
CN120429091A CN120429091A (en) 2025-08-05
CN120429091B true CN120429091B (en) 2025-09-12

Family

ID=96552731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510919781.2A Active CN120429091B (en) 2025-07-04 2025-07-04 A method, device, product, server and medium for controlling an operation process

Country Status (1)

Country Link
CN (1) CN120429091B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117827423A (en) * 2023-11-28 2024-04-05 济南浪潮数据技术有限公司 GPU sharing method and device, electronic equipment and storage medium
CN118132217A (en) * 2022-12-02 2024-06-04 中国电信股份有限公司 Task scheduling method and device, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9197670B2 (en) * 2013-10-08 2015-11-24 Centrify Corporation Method and apparatus for creating conditional windows process tokens
KR102092459B1 (en) * 2018-06-20 2020-03-23 한국과학기술원 Method and System to manage and schedule GPU memory resource in Container-based virtualized environment
WO2025035366A1 (en) * 2023-08-14 2025-02-20 华为技术有限公司 Resource scheduling method and apparatus
CN119356805A (en) * 2024-07-25 2025-01-24 浙江利尔达物联网技术有限公司 A module resource scheduling method based on dynamic priority
CN119336448A (en) * 2024-10-17 2025-01-21 中国电信股份有限公司技术创新中心 Business processing method, processing device, equipment, storage medium and program product

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118132217A (en) * 2022-12-02 2024-06-04 中国电信股份有限公司 Task scheduling method and device, electronic equipment and storage medium
CN117827423A (en) * 2023-11-28 2024-04-05 济南浪潮数据技术有限公司 GPU sharing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN120429091A (en) 2025-08-05

Similar Documents

Publication Publication Date Title
JP5165960B2 (en) Balancing resource sharing and application latency in data processing systems
EP3425502B1 (en) Task scheduling method and device
CN109564528B (en) System and method for computing resource allocation in distributed computing
US8645592B2 (en) Balancing usage of hardware devices among clients
US9507631B2 (en) Migrating a running, preempted workload in a grid computing system
JP5939740B2 (en) Method, system and program for dynamically allocating resources
US7975269B2 (en) Parallel processor methods and apparatus
WO2004012080A2 (en) Method for dynamically allocating and managing resources in a computerized system having multiple consumers
KR20050016170A (en) Method and system for performing real-time operation
CN114721818B (en) A GPU time-sharing sharing method and system based on Kubernetes cluster
CN115576683A (en) A coroutine pool scheduling management method, system, device and storage medium
CN120429091B (en) A method, device, product, server and medium for controlling an operation process
CN116841751B (en) Policy configuration method, device and storage medium for multi-task thread pool
CN110333899B (en) Data processing method, device and storage medium
US20240378084A1 (en) Methods and apparatus for processing data
CN115102851B (en) Fusion platform for HPC and AI fusion calculation and resource management method thereof
CN110968418B (en) Scheduling method and device for large-scale constrained concurrent tasks based on signals and slots
CN110955644A (en) IO control method, device, equipment and storage medium of storage system
US12210521B2 (en) Short query prioritization for data processing service
JP7478918B2 (en) Task intelligent processing method based on distributed heterogeneous system
Jeong et al. Dynamic Resource Adjustment Operator Based on Autoscaling for Improving Distributed Training Job Performance on Kubernetes
KR20230143025A (en) Resource-aware device allocation of multiple gpgpu applications on multi-accelerator system
Hui et al. FluidFaaS: A Dynamic Pipelined Solution for Serverless Computing with Strong Isolation-based GPU Sharing
CN119271351A (en) A virtual processor scheduling method, device, equipment and medium
JPH0424828A (en) Multi-task control system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant