[go: up one dir, main page]

CN120469816A - A method, system, computer device and storage medium for dynamic computing power division - Google Patents

A method, system, computer device and storage medium for dynamic computing power division

Info

Publication number
CN120469816A
CN120469816A CN202510962337.9A CN202510962337A CN120469816A CN 120469816 A CN120469816 A CN 120469816A CN 202510962337 A CN202510962337 A CN 202510962337A CN 120469816 A CN120469816 A CN 120469816A
Authority
CN
China
Prior art keywords
task
computing resources
resource
segmentation
executed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510962337.9A
Other languages
Chinese (zh)
Inventor
于淼
张坤睿
胡文博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Raycom Joint Creation Tianjin Information Technology Co ltd
Original Assignee
Raycom Joint Creation Tianjin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Raycom Joint Creation Tianjin Information Technology Co ltd filed Critical Raycom Joint Creation Tianjin Information Technology Co ltd
Priority to CN202510962337.9A priority Critical patent/CN120469816A/en
Publication of CN120469816A publication Critical patent/CN120469816A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

本申请属于计算机的技术领域,公开了一种算力动态切分方法、系统、计算机设备及存储介质,方法包括:收集系统历史负载数据和任务执行模式,采用机器学习算法模型预测未来资源使用情况;监控系统整体负载数据;搜索系统中可用计算资源;若所述系统整体负载数据超出整体负载阈值、单个任务资源需求超出资源紧张阈值和/或所述未来资源使用情况超出系统计算能力,触发算力切分决策;匹配所述可用计算资源与待执行任务的资源参数的兼容性,获取兼容性结果;根据所述兼容性结果、所述可用计算资源及所述待执行任务的资源参数制定切分策略;根据所述切分策略重新分配所述可用计算资源。能够解决算力切分存在的资源利用率低、等待时间长的技术问题。

The present application belongs to the technical field of computers and discloses a method, system, computer device and storage medium for dynamic computing power splitting. The method includes: collecting historical system load data and task execution mode, using a machine learning algorithm model to predict future resource usage; monitoring the overall system load data; searching for available computing resources in the system; triggering a computing power splitting decision if the overall system load data exceeds the overall load threshold, the resource demand of a single task exceeds the resource shortage threshold and/or the future resource usage exceeds the system computing capacity; matching the compatibility of the available computing resources with the resource parameters of the task to be executed to obtain a compatibility result; formulating a splitting strategy based on the compatibility result, the available computing resources and the resource parameters of the task to be executed; and reallocating the available computing resources based on the splitting strategy. The method can solve the technical problems of low resource utilization and long waiting time in computing power splitting.

Description

Dynamic segmentation method, system, computer equipment and storage medium for computing power
Technical Field
The present application relates to the field of computer technologies, and in particular, to a dynamic splitting method, a dynamic splitting system, a computer device and a storage medium.
Background
Currently, the power splitting refers to a process of splitting the power resources of a physical GPU into multiple virtual GPUs (commonly referred to as vGPU). The purpose of this is to increase the efficiency of GPU utilization, allowing multiple users or applications to share the computational resources of the same physical GPU at the same time.
The calculation force segmentation mode mainly comprises division according to physical equipment and division according to time slices. When dividing by physical devices, such as in a data center, servers or computing clusters are assigned to specific users or tasks, in this way, the resource utilization is low, some devices may be idle, and cluster management is complex. When dividing according to time slices, like early large-scale host multi-user time division multiplexing, although the calculation power of equipment can be utilized, excessive tasks or overlong individual tasks can cause the increase of waiting time and response delay.
Disclosure of Invention
Therefore, the embodiment of the application provides a dynamic computing force segmentation method, a system, computer equipment and a storage medium, which can solve the technical problems of low resource utilization rate and long waiting time in computing force segmentation, and the specific technical scheme comprises the following steps:
in a first aspect, an embodiment of the present application provides a method for dynamically splitting a computing force, where the method includes:
collecting historical load data of a system and a task execution mode, and predicting future resource use conditions by adopting a machine learning algorithm model;
Monitoring system overall load data;
Computing resources available in the search system;
Triggering a calculation segmentation decision if the overall load data of the system exceeds an overall load threshold, the resource demand of a single task exceeds a resource shortage threshold and/or the future resource use condition exceeds the system computing capacity;
Matching the compatibility of the available computing resources and the resource parameters of the task to be executed to obtain a compatibility result;
Formulating a segmentation strategy according to the compatibility result, available computing resources and resource parameters of the task to be executed;
and reallocating the available computing resources according to the segmentation strategy.
Preferably, the method further comprises:
Setting a task ordering rule, wherein the task ordering rule comprises a task priority strategy and/or a time slice rotation strategy;
maintaining a task queue according to a task ordering rule, wherein the task queue stores tasks to be executed and resource parameters of the tasks to be executed;
according to the compatibility result, the available computing resources and the resource parameters of the task to be executed, a segmentation strategy is formulated as follows:
and formulating a segmentation strategy according to the compatibility result, the task queue and the available computing resources.
Preferably, the method further comprises:
setting a task ordering rule, wherein the task ordering rule comprises a priority strategy;
Acquiring the priority of the executing task by a priority policy;
releasing the computing resources of the executing task having a lower priority than the preset priority low value as available computing resources and/or releasing the computing resources of the executing task having a lower priority than the task to be executed as available computing resources.
Preferably, the method further comprises:
when a first task of the executing tasks is interrupted;
Storing the execution state of the first task;
releasing the computing resources of the first task as available computing resources;
and re-executing the first task until the interrupt disappears.
Preferably, before reallocating the available computing resources according to the slicing policy, the method further comprises:
And predicting the running condition of the system computing resources after segmentation according to the segmentation strategy by using a modeling tool or a sandbox environment.
Preferably, the method further comprises:
performing a verification test, the verification test including a benchmark test, a stress test, and/or a stability test;
If the verification test result is a negative effect, rolling back to the previous system computing resource allocation configuration;
And if the verification test result does not reach the expected value, adjusting the segmentation strategy or optimizing the execution mode of the task.
Preferably, the available computing resources in the search system include:
detecting system computing resources;
detecting the execution progress and performance index of a task being executed in the system;
And acquiring available computing resources according to the data of the system computing resources occupied by the executing task.
In a second aspect, embodiments of the present application provide a dynamic slicing system, the system comprising:
The resource and task monitoring module is used for collecting historical load data of the system and task execution modes, and predicting future resource use conditions by adopting a machine learning algorithm model;
the load evaluation and demand analysis module is also used for monitoring the overall load data of the system;
The adjustable resource query module is used for searching available computing resources in the system;
The judging module is used for triggering a calculation force segmentation decision if the overall load data of the system exceeds a preset load threshold, the single task resource requirement exceeds a preset task threshold and/or the future resource use condition exceeds the system computing capacity;
the resource compatibility checking module is used for matching the compatibility of the available computing resources and the resource parameters of the tasks to be executed and obtaining a compatibility result;
the strategy making module is used for making a segmentation strategy according to the compatibility result, the available computing resources and the resource parameters of the task to be executed;
and the execution module is used for reallocating the available computing resources according to the segmentation strategy.
In a third aspect, embodiments of the present application provide a computer device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the steps of a method for dynamic segmentation of computational power of any one of the above when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, which when executed by a processor, implements the steps of a method for dynamically splitting a computing force of any of the above.
In summary, compared with the prior art, the technical scheme provided by the embodiment of the application has the following beneficial effects:
1. Continuously monitoring the resource use and load trend of each calculation task, predicting the future resources through a machine learning algorithm model, finding out the calculation force adjustment requirement in time, setting a segmentation strategy by setting a threshold value and combining the matching characteristics of the tasks and the resources, and the like, ensuring compatibility and accessibility, realizing full utilization of the resources, shortening waiting time and realizing reasonable calculation force distribution;
2. The task ordering rules may be either or a combination of a priority policy and a time slice rotation policy. The method is convenient for sequencing and/or scheduling the occupied resource time of the tasks according to the characteristics of the emergency degree, the real-time requirement and the like of the tasks, and reasonably utilizes the resources.
Drawings
Fig. 1 is a flow chart of a dynamic segmentation method according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of a dynamic splitting method according to another embodiment of the present application.
FIG. 3 is a second flowchart of a dynamic segmentation method according to another embodiment of the present application.
FIG. 4 is a flowchart illustrating a third embodiment of a dynamic splitting method according to the present application.
FIG. 5 is a flowchart illustrating a dynamic splitting method according to another embodiment of the present application.
FIG. 6 is a flowchart of a dynamic splitting method according to another embodiment of the present application.
FIG. 7 is a flowchart illustrating a dynamic splitting method according to another embodiment of the present application.
Detailed Description
The present embodiment is only for explanation of the present application and is not to be construed as limiting the present application, and modifications to the present embodiment, which may not creatively contribute to the present application as required by those skilled in the art after reading the present specification, are all protected by patent laws within the scope of claims of the present application.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In addition, the term "and/or" in the present application is merely an association relation describing the association object, and indicates that three kinds of relations may exist, for example, a and/or B may indicate that a exists alone, and a and B exist together, and B exists alone. In the present application, unless otherwise specified, the term "/" generally indicates that the associated object is an "or" relationship.
The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and "n," and that there is no limitation on the amount and order of execution.
The term "at least one" in the present application means one or more, and "a plurality" means three or more, for example, a plurality of first positions means three or more first positions.
Embodiments of the application are described in further detail below with reference to the drawings.
Referring to fig. 1, in one embodiment of the present application, a dynamic slicing method is provided, and the main steps of the method are described as follows:
S1, collecting historical load data of a system and a task execution mode, and predicting future resource use conditions by adopting a machine learning algorithm model;
S2, monitoring overall load data of the system;
s3, searching available computing resources in the system;
s4, triggering a calculation force segmentation decision if the overall load data of the system exceeds a preset load threshold, the single task resource requirement exceeds a preset task threshold and/or the future resource use condition exceeds the system computing capacity;
s5, matching the compatibility of the available computing resources and the resource parameters of the task to be executed, and obtaining a compatibility result;
S6, formulating a segmentation strategy according to the compatibility result, the available computing resources and the resource parameters of the task to be executed;
And S8, reallocating the available computing resources according to the segmentation strategy.
In this embodiment, in order to fully explain the present embodiment, a software and hardware coordination example capable of implementing the method of the present embodiment is provided as follows, and in other embodiments of the present application, existing software and hardware coordination examples capable of implementing the same function may also be used, which will not be described herein in detail, by deep integration with Kubernetes (K8 s), efficient scheduling of GPU resources is implemented. By means of a Device plug in (NVIDIA) mechanism, K8s can accurately identify and allocate GPU resources. The user can realize the efficient utilization of resources by only easily designating the GPU equipment and the video memory requirement in the Pod definition. Kubernetes provides a solid guarantee in GPU support for containerized applications by virtue of its excellent extensibility. The MIG technology is used for realizing multi-instantiation of the GPU, the GPU can be finely divided into a plurality of mutually independent computing units, and each application can enjoy fine-granularity resource isolation and can realize resource sharing, so that efficient and safe operation under a virtualized environment is ensured. And by means of Horizontal Pod Autoscaler of Kubernetes or an independently developed autoscaler plug-in, the resources are intelligently adjusted according to the load condition of the GPU, and the automation mechanism can automatically increase or decrease the GPU resources.
In this embodiment, first, a computing system including a GPU card is turned on, and a related driver and management software are loaded, where in this embodiment, the driver includes:
Resource management module
1) Hardware resource awareness
Is responsible for detecting and identifying various computing resources in the system, including CPU cores, GPU cores, memory modules, storage devices, and the like. For example, for a system with multiple CPU cores, the driver needs to know exactly the model, frequency, cache size, etc. parameters of each core. When identifying the GPU, determining information such as the model of the GPU, the video memory capacity, the video memory bandwidth, the number of CUDA cores (for NVIDIA GPUs) or stream processors (for AMD GPUs), and the like.
This information is obtained by communicating with the underlying interface of the computer hardware. Taking the Linux system as an example, the driver may collect detailed data of the hardware resources by using related entries in the/proc and/sys file systems or by calling the BIOS/UEFI interface provided by the hardware vendor.
Through the operation, the information of the available computing resources in the system is searched, and the task matching and compatibility matching can be conveniently carried out subsequently.
2) Resource allocation policy
Based on the demands of the application program and the overall state of the system, the resource management module acquires a calculation segmentation strategy. For example, when multiple applications request computing resources at the same time, the driver may employ a priority policy. For application programs with high requirements on real-time performance, such as real-time video stream processing software, high priority is allocated to ensure that enough computing power can be obtained preferentially.
The time slice rotation strategy can also be adopted to distribute the computing power resources among different application programs according to a certain time interval. For example, the usage rights of the CPU core are switched from one application to another at regular intervals (e.g., 10 milliseconds) to ensure that each application gets a certain computation time.
3) Resource monitoring and adjustment
It is important to continuously monitor the use of computational resources. The driver will track the occupancy rate of each application to the CPU, GPU, etc. resources in real time, for example by reading performance counters provided by the operating system or monitoring registers of the hardware itself.
The driver may automatically adjust when it finds that an application is over-occupied or under-utilized. For example, if one data processing application occupies a large amount of GPU resources for a long time, resulting in the other graphics rendering applications being stuck, the driver may properly reduce the GPU resource allocation of that data processing application to ensure proper operation of the graphics rendering application.
Task scheduling module
1) Task queue management
The driver needs to maintain a task queue for storing computing tasks waiting to be executed. These tasks may come from different applications, each containing information about the type of task (e.g., CPU-intensive, GPU-intensive), priority, amount of resources needed, etc. When a new task enters the queue, the driver inserts the task into the appropriate location according to its priority and resource requirements. For example, for a high priority deep learning training task, the driver may rank it ahead of the task queue to allocate computing resources to it as soon as possible.
2) Task allocation and execution
The task scheduling module distributes tasks to the appropriate computing units for execution according to the availability of resources and the needs of the tasks. For a CPU-GPU heterogeneous computing system, it will determine whether the task is suitable for execution on the CPU or on the GPU. For example, for highly parallelized computational tasks such as matrix multiplication, the driver may allocate them to execution on the GPU because the GPU has a large number of parallel computational units and can complete the computation faster.
During task execution, the driver also needs to handle interrupts and resumptions of tasks. If an interrupt event (e.g., hardware failure, high priority task insertion, etc.) occurs to the system, the driver needs to pause the task currently being executed, save the execution state of the task, and then reallocate the computing resources to a new task or handle the interrupt event. After the interrupt event processing is completed, the driver may resume execution of the suspended task.
Interface module
1) Application Program Interface (API)
APIs include function calls, system calls, or object-oriented programming interfaces. For example, an application may request a certain amount of CPU core or GPU memory by calling an API function provided by a driver.
These APIs may also be used for applications to pass information about tasks to drivers, such as task priorities, predicted execution times, etc. Meanwhile, the driver may feed back the result of resource allocation to the application program through the API, for example, whether the required resource is successfully allocated or not, and specific information (for example, the allocated CPU core number, GPU device ID, etc.) of the allocated resource.
2) Operating system interface
These interfaces are used to receive scheduling instructions for the operating system, report resource usage to the operating system, and so on. For example, when an operating system needs to pause an application to free up resources, it will send instructions to the driver via the driver's operating system interface.
The driver may also obtain global information of the system, such as a load condition of the system, a state of other devices, and the like, through an operating system interface. This facilitates the driver to better allocate resources and schedule tasks to accommodate the overall operating environment of the system.
In this embodiment, virtual GPUs are segmented using NVIDIA vGPU software, which allows the computing power of a physical GPU to be split into multiple virtual GPUs, each of which can be assigned to a different virtual machine or container for use. For example, in a data center environment, multiple users may share one physical GPU through vGPU technology while running graphics intensive applications or deep learning model training. It supports multiple GPU splitting modes, such as fixed allocation (each virtual GPU allocates a fixed number of CUDA cores, memory, etc. resources) and dynamic allocation (dynamically adjusting the resources of the virtual GPUs according to the needs of the application). By using HAMi open source segmentation software to support a plurality of calculation force segmentation modes such as MIG, MPS and the like, the calculation force can be finely segmented to 1%, and the video memory segmentation is in megabits. The method is fully suitable for domestic/non-domestic computing power, and can perform mixed deployment, unified scheduling and management on different types of computing power clusters such as Injeida, huazheng, sea light, chilly, day intelligent core, bath light, molar threads and the like. The one-card multi-purpose is realized through a virtualization technology, and the hardware utilization rate is improved.
The real-time monitoring nvidia GPU is instructed by nvidia-smi and rocmon under the Linux system, and the real-time collection of various parameters of the GPU card by software can be realized by utilizing interface feedback parameter information, including but not limited to the information such as the utilization rate of the video memory, the core frequency, the temperature, the utilization rate of each computing unit and the like.
In this embodiment, the overall load data is weighted by comprehensively considering the memory and the usage of the computational core. In other embodiments of the application, other parameters may be used, such as any one, two or more of the following in combination:
GPU usage (rendering), which refers to how busy a graphics processing unit is during a particular period of time, is typically expressed in percent. 100% means that the GPU is running at full load.
Memory Usage (Memory Usage) refers to the proportion of Memory used, also typically expressed in percent, or directly as a ratio of used Memory to total Memory.
Temperature (Temperature) the operating Temperature of the graphics card is also an important indicator of its load. Excessive temperatures may cause performance degradation or automatically lower frequencies to protect hardware.
Fan Speed (Fan Speed), which may reflect the effort made by the graphics card to maintain the proper temperature, is typically expressed in terms of Revolutions Per Minute (RPM) or percent relative to maximum Speed.
Power consumption Power Consumption the power currently consumed by the graphics card is another key indicator that can help to understand the operating strength of the graphics card. The power consumption is typically measured in watts (W).
Clock frequency (Clock Speed), which includes the core frequency and the video memory frequency, reflects the current running Speed of the video card. Dynamic adjustment of frequency is a common way of managing performance and power consumption of modern graphics cards.
API call rate (API CALL RATE) the indicators of drawing call times per Second (DRAW CALLS PER seconds) or frame generation time can reflect the load condition of the graphics card for some applications, especially games.
The rendering delay (RENDER LATENCY) from the time the rendering command is submitted to the time the rendering is actually completed may also be an indicator of the load.
Other custom indexes, such as throughput, inference time, etc. in machine learning task, may also have other load related indexes according to different application scenarios.
In this embodiment, collecting historical data includes, but is not limited to, collecting performance monitoring data over a period of time, including key metrics such as CPU, GPU, memory, network bandwidth, etc. Workload logs are collected, and workload conditions of different time periods, such as task submission frequency, task type, execution time and the like, are recorded.
Task execution mode-similar workloads are grouped using clustering algorithms to better understand the resource requirements of different types of tasks.
And then, adopting the existing machine learning algorithm model to carry out predictive analysis, and predicting the load change condition in a future period according to the historical load data and the task execution mode by combining the machine learning algorithm. If an impending resource bottleneck is predicted, i.e. future resource usage exceeds the system computing power, a decision may be made to make a computationally intensive segmentation decision even if the overall load threshold is not currently exceeded.
3) Evaluating existing resources
And (3) analyzing the utilization rate of the resources, namely evaluating the utilization rate of the existing resources, and finding out the unused or over-used resources, namely the available computing resources. Bottlenecks in the system, such as CPU/GPU bottlenecks, memory bottlenecks, network bottlenecks, etc., are identified and analyzed for impact on overall performance.
Threshold comparison, namely setting a plurality of thresholds including a GPU overall load threshold (such as the utilization rate of the video memory exceeds 80% and the utilization rate of the computing core exceeds 70%) and a resource shortage threshold of a single task (such as the video memory requirement of a certain task grows to exceed a certain proportion in a short time and approaches the upper limit of current allocation), so that when in use, an upper resource limit is set for each task or user group, and excessive resources occupied by the single task are avoided. When the monitored data exceeds these thresholds, a power slicing decision is triggered.
In this embodiment, when the overall load data of the system exceeds the overall load threshold, the single task resource requirement exceeds the resource shortage threshold, and the future resource usage exceeds any one of the system computing capabilities, that is, the computing power splitting decision is triggered, in other embodiments of the present application, any two or three of the triggering modes may be adopted.
The available GPU resources, i.e., available computing resources, in the search system are then searched, including other idle GPU cards, underutilized portions of the current GPU card (e.g., some computing units are idle), or computing power resources that may be temporarily reclaimed from the low priority task. And then, carrying out resource compatibility checking to ensure that the resources to be allocated are compatible with the currently running task and system environment, for example, checking whether GPU architecture, driving version, video memory type and the like are matched or not to obtain a compatibility matching result of the task and available computing resources, namely, a compatibility result, and grouping similar workloads by using a clustering algorithm in the embodiment so as to better understand the resource requirements of different types of tasks.
Finally, based on task characteristics, a segmentation strategy is formulated according to compatibility results, available computing resources and resource parameters of the task to be executed. The resource parameters of the task to be executed are GPU architecture, driving version, video memory type and the like required by the task execution. For the key tasks with high priority (such as online deep learning reasoning service), the calculation power demand is guaranteed preferentially, and for the tasks with low real-time requirements (such as background model pre-training tasks), the calculation power distribution can be adjusted appropriately. When the calculation force is split, load balance among the GPU cards is kept as much as possible, and the situation that some GPU cards are used excessively and other GPU cards are idle is avoided. Tasks and computing forces can be reasonably distributed according to the performance difference of the GPU cards (such as different computing forces of GPUs with different models). In other embodiments of the present application, the slicing strategy may be formulated according to other parameters that need to be considered.
And dynamically adjusting the computing power of the GPU card through an interface provided by GPU management software or a driver. This may include reallocating the size of the memory, adjusting the number of allocations of computing cores, changing the execution queues of tasks, and so on.
According to the application, the resource use and the load trend of each calculation task are continuously monitored, the missing resources are predicted through the machine learning algorithm model, the calculation force adjustment requirement is found in time, the segmentation strategy is formulated by setting the threshold value and combining the matching characteristics of the tasks and the resources, the compatibility and the accessibility are ensured, the full utilization of the resources is realized, the waiting time is shortened, and the calculation force distribution is realized.
Referring to fig. 2, in another embodiment of the present application, further comprising:
s61, setting task ordering rules, wherein the task ordering rules comprise a task priority strategy and/or a time slice rotation strategy;
s62, maintaining a task queue according to a task ordering rule, wherein the task queue stores tasks to be executed and resource parameters of the tasks to be executed;
and S6, formulating a segmentation strategy according to the compatibility result, the available computing resources and the resource parameters of the task to be executed, wherein the segmentation strategy is as follows:
and S63, formulating a segmentation strategy according to the compatibility result, the task queue and the available computing resources.
In this embodiment, the task ordering rule may be either one or a combination of the priority policy and the time slice rotation policy. If the task ordering rule is set as a priority strategy, the task ordering rule is convenient to order according to the characteristics of the task such as the emergency degree, the real-time requirement and the like, and resources are reasonably utilized.
Referring to fig. 3, in another embodiment of the present application, further comprising:
s301, setting a task ordering rule, wherein the task ordering rule comprises a priority strategy;
s302, acquiring the priority of the task being executed by a priority policy;
S303, releasing the computing resources of the executing task with the priority lower than the preset priority low value as available computing resources and/or releasing the computing resources of the executing task with the priority lower than the task to be executed as available computing resources.
In this embodiment, the priority is set for the task, and when the overall load data of the system exceeds the overall load threshold, the resource of the task with low priority is occupied, so that the task with high priority can be executed first, and delay or operation discomfort generated when the user end uses the system is reduced. When the resources of the low priority task are occupied, the low priority task reenters the ordering. In other examples of this embodiment, a low priority task that is occupied with computing resources may also wait for the task that is occupied with its computing resources to complete the computation, then take over the original computing resources,
In this embodiment, the preset low priority value is the priority level preset before the task with low priority occupies the resource release, for example, when the task is required to be split with fine computing power, the preset low priority value is 3, so that the resources occupied by all the tasks with 3 priority are released, in other examples of this embodiment, the priority level of the task with the priority lower than the current threshold exceeding the resource tension is the preset priority, so as to realize the dynamic scheduling of the resources, release more tasks with low real-time requirements, and provide computing resources for the tasks with high priority.
Referring to fig. 4, in another embodiment of the present application, further comprising:
s311 when a first task among the tasks being executed is interrupted;
s312, saving the execution state of the first task;
s313, releasing the computing resources of the first task as available computing resources;
and S314, re-executing the first task until the interrupt disappears.
In the present embodiment, the first task is any type of task being executed, and is not limited herein.
When the first task is interrupted, the resources occupied by the first task are released, so that the resources occupied by the first task can be fully utilized, and the flexibility of resource application is improved.
Referring to fig. 5, in another embodiment of the present application, before step S8, the method further includes:
and S7, predicting the running condition of the system computing resources after segmentation according to the segmentation strategy by using a modeling tool or a sandbox environment.
In this embodiment, the modeling tool or the sandbox environment is used to predict the adjusted impact, evaluate possible risks and benefits, and reduce the adverse impact caused by the running of the segmentation strategy.
Referring to fig. 6, in another embodiment of the present application, further comprising:
S9, performing verification tests, wherein the verification tests comprise a benchmark test, a pressure test and/or a stability test;
s10, if the verification test result is a negative effect, rolling back to the previous system computing resource allocation configuration;
And S11, if the verification test result does not reach the expected value, adjusting the segmentation strategy or optimizing the execution mode of the task.
In this embodiment, the validation test includes any one or more of a benchmark test, a stress test, and a stability test.
Benchmarking a series of benchmarks are run, covering daily operations and extreme cases, to evaluate the effect of the new configuration.
And (3) pressure test, namely applying high load to the system, checking the performance of the system under the limit condition, and ensuring the robustness of the system.
And (3) stability test, namely running the system for a long time and observing whether performance degradation or other abnormal conditions occur.
Comparing the performance data under the new configuration with the previous baseline data to confirm whether the expected optimization objective is reached. And further adjusting the configuration according to the test result, and continuously optimizing until all the targets are met. If a new configuration is found to have a negative effect, the configuration is immediately rolled back to the previous configuration and the adjustment scheme is re-evaluated. If the performance of a task is found to be unexpected, its computational power allocation may be adjusted appropriately or the execution of the task on the GPU card may be optimized.
The specific fine tuning steps can be divided into:
1) Preparation and planning
The optimization targets are determined, namely specific targets which are expected to be achieved through fine tuning are determined, such as performance improvement of specific tasks, overall cost reduction, resource utilization improvement and the like.
Selecting tools-selecting appropriate monitoring and management tools, such as NVIDIA SYSTEM MANAGEMENT INTERFACE (nvidia-smi), prometheus, grafana, etc., for collecting and analyzing data.
2) Presence assessment
Baseline data were collected:
The usage monitoring tool records the current resource usage (CPU, GPU, memory, network, etc.).
Key Performance Indicators (KPIs) such as response time, throughput, error rate, etc. are recorded.
Identifying a bottleneck:
The existing data is analyzed to find out the place or performance bottleneck where the resource allocation is unreasonable.
It is determined which tasks or services have resource contention during peak hours.
3) Making an adjustment scheme
Defining adjustment parameters:
Task priority, namely setting priorities for different tasks according to service importance.
Setting a resource upper limit for each task or user group, and avoiding the occupation of excessive resources by a single task.
And (3) dynamic scheduling, namely starting or optimizing the existing dynamic scheduling algorithm, and automatically adjusting resource allocation according to the real-time load.
The isolation mechanism is to consider using containerization (Docker, kubernetes) or virtual machine technology to isolate tasks, so as to ensure more flexible and controllable resource allocation.
Simulation effect:
the modeling tools or sandbox environment are used to predict the impact after adjustment, assessing possible risk and benefit.
4) Implementing the adjustment
Small range test points:
The new configuration is tested in advance in a non-production environment, ensuring its stability and effectiveness.
Data during the pilot points are collected to compare performance changes before and after adjustment.
Gradually popularizing:
And gradually popularizing new configuration in the production environment according to the test point result, starting from the low risk area.
After each popularization, the system performance is closely monitored, and no new problem is introduced.
5) Verification effect
Benchmark test:
a series of benchmarks are run, covering daily operations and extremes, to evaluate the effect of the new configuration.
And (3) pressure test:
high loads are applied to the system, and the performance of the system under the limit condition is checked, so that the robustness of the system is ensured.
Stability test:
And (3) running the system for a long time, and observing whether performance degradation or other abnormal conditions occur.
6) Analysis and iteration
Comparison analysis:
comparing the performance data under the new configuration with the previous baseline data to confirm whether the expected optimization objective is reached.
Continuous improvement:
And further adjusting the configuration according to the test result, and continuously optimizing until all the targets are met.
If a new configuration is found to have a negative effect, the configuration is immediately rolled back to the previous configuration and the adjustment scheme is re-evaluated.
If the performance of a task is found to be unexpected, its computational power allocation may be adjusted appropriately or the execution of the task on the GPU card may be optimized.
Through the setting of the embodiment, the characteristics of practicality, stability and the like of the updated system calculation force segmentation strategy are evaluated, the fault condition of the system is reduced, and the reliability of the system is improved.
Referring to fig. 7, in another embodiment of the present application, step S3 includes:
s31, detecting system computing resources;
s32, detecting the execution progress and performance index of the task being executed in the system;
s33, acquiring available computing resources according to the data of the system computing resources occupied by the executing task.
By the arrangement of the implementation, the available computing resources of the system can be accurately detected. The system computing resources include CPU cores, GPU cores, memory modules, storage devices, and the like. For a system with multiple CPU cores, the driver needs to know exactly the model, frequency, cache size, etc. parameters of each core.
In another embodiment of the present application, further comprising:
notifying an administrator and recording an operation log;
The system is continuously monitored to calculate the resource running condition.
In the embodiment, the administrator is notified of the execution condition of the calculation segmentation by means of mail, system information and the like, wherein the execution condition comprises information such as the segmentation reason, the adjustment content, the influence on task performance and the like. And recording an operation log, namely recording related data of the whole calculation force segmentation process into a log file, wherein the related data comprise monitoring data, segmentation decision basis, executed operation and the like, so that subsequent audit and analysis can be realized. And continuously monitoring, namely returning to the step of monitoring the resources and the tasks, and continuously monitoring the computing power use condition and the task load of the GPU card in real time to form a closed-loop dynamic management flow so as to timely carry out the next computing power segmentation adjustment according to the change of the system.
In another embodiment of the present application, the method further comprises:
and synchronizing and migrating the data.
In this embodiment, if the slicing involves movement of data between different GPU memory regions, accurate synchronization and migration of data is ensured. For example, in distributed deep learning training, model parameters and data may need to be redistributed among different GPU cards.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
In one embodiment of the present application, a dynamic computing force splitting system is provided, which corresponds to one dynamic computing force splitting method in the above embodiment. The dynamic segmentation system of the computing power comprises:
The resource and task monitoring module is used for collecting historical load data of the system and task execution modes, and predicting future resource use conditions by adopting a machine learning algorithm model;
the load evaluation and demand analysis module is also used for monitoring the overall load data of the system;
The adjustable resource query module is used for searching available computing resources in the system;
The judging module is used for triggering a calculation force segmentation decision if the overall load data of the system exceeds a preset load threshold, the single task resource requirement exceeds a preset task threshold and/or the future resource use condition exceeds the system computing capacity;
the resource compatibility checking module is used for matching the compatibility of the available computing resources and the resource parameters of the tasks to be executed and obtaining a compatibility result;
the strategy making module is used for making a segmentation strategy according to the compatibility result, the available computing resources and the resource parameters of the task to be executed;
and the execution module is used for reallocating the available computing resources according to the segmentation strategy.
The above-mentioned dynamic dividing system of calculation power can be realized by all or part of software, hardware and their combination. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment of the present application, a computer device is provided, which may be a server. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The Memory of the computer device may be implemented by any type of volatile or nonvolatile Memory device, including, but not limited to, magnetic disks, optical disks, EEPROMs (Electrically erasable programmable Read-Only Memory), EPROMs (Erasable Programmable Read Only Memory, erasable programmable Read-Only Memory), SRAMs (Static Random Access Memory ), ROMs (Read-Only Memory), magnetic memories, flash memories, PROMs (Programmable Read-Only Memory). The memory of the computer device provides an environment for the running of an operating system and computer programs stored therein. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements the steps of a dynamic segmentation method of the above embodiment.
In one embodiment of the present application, a computer readable storage medium is provided, the computer readable storage medium storing a computer program which, when executed by a processor, implements a dynamic slicing method step of the above embodiment. The computer readable storage medium includes ROM (Read-Only Memory), RAM (Random-Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic disk, floppy disk, and the like.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus of the present application is divided into different functional units or modules to perform all or part of the above-described functions.

Claims (10)

1. A method for dynamically slicing a computing force, the method comprising:
collecting historical load data of a system and a task execution mode, and predicting future resource use conditions by adopting a machine learning algorithm model;
Monitoring system overall load data;
Computing resources available in the search system;
triggering a calculation segmentation decision if the system overall load data exceeds an overall load threshold, the single task resource requirement exceeds a resource shortage threshold and/or the future resource use condition exceeds system computing capacity;
Matching the compatibility of the available computing resources and the resource parameters of the task to be executed, and obtaining a compatibility result;
Formulating a segmentation strategy according to the compatibility result, the available computing resources and the resource parameters of the task to be executed;
and reallocating the available computing resources according to the segmentation strategy.
2. A dynamic segmentation method according to claim 1, characterized in that the method further comprises:
Setting a task ordering rule, wherein the task ordering rule comprises a task priority strategy and/or a time slice rotation strategy;
Maintaining a task queue according to the task ordering rule, wherein the task queue stores the tasks to be executed and resource parameters of the tasks to be executed;
the step of formulating a segmentation strategy according to the compatibility result, the available computing resources and the resource parameters of the task to be executed is as follows:
and formulating a segmentation strategy according to the compatibility result, the task queue and the available computing resources.
3. A dynamic segmentation method according to claim 1, characterized in that the method further comprises:
setting a task ordering rule, wherein the task ordering rule comprises a priority strategy;
acquiring the priority of the task being executed by the priority policy;
releasing the computing resources of the executing task with a lower priority than a preset priority low value as available computing resources and/or releasing the computing resources of the executing task with a lower priority than the task to be executed as available computing resources.
4. A dynamic segmentation method according to claim 1, characterized in that the method further comprises:
when a first task of the executing tasks is interrupted;
Storing the execution state of the first task;
Releasing the computing resources of the first task as available computing resources;
and re-executing the first task until the interrupt disappears.
5. The method of claim 1, further comprising, prior to said reallocating said available computing resources according to said slicing policy:
And predicting the running condition of the system computing resources after segmentation according to the segmentation strategy by using a modeling tool or a sandbox environment.
6. A dynamic segmentation method according to claim 1, characterized in that the method further comprises:
performing a validation test, the validation test including a benchmark test, a stress test, and/or a stability test;
if the verification test result is a negative effect, rolling back to the previous system computing resource allocation configuration;
and if the verification test result does not reach the expected value, adjusting the execution mode of the segmentation strategy or the optimization task.
7. The method of claim 1, wherein the available computing resources in the search system comprise:
Detecting system computing resources;
detecting the execution progress and performance index of a task being executed in the system;
And acquiring available computing resources according to the data of the system computing resources occupied by the executing task.
8. A system for dynamic segmentation of computational forces, the system comprising:
The resource and task monitoring module is used for collecting historical load data of the system and task execution modes, and predicting future resource use conditions by adopting a machine learning algorithm model;
the load evaluation and demand analysis module is also used for monitoring the overall load data of the system;
The adjustable resource query module is used for searching available computing resources in the system;
The judging module is used for triggering a calculation force segmentation decision if the overall load data of the system exceeds a preset load threshold, the single task resource requirement exceeds a preset task threshold and/or the future resource use condition exceeds the system computing capacity;
the resource compatibility checking module is used for matching the compatibility of the available computing resources and the resource parameters of the task to be executed and obtaining a compatibility result;
The strategy making module is used for making a segmentation strategy according to the compatibility result, the available computing resources and the resource parameters of the task to be executed;
and the execution module is used for reallocating the available computing resources according to the segmentation strategy.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and running on the processor, the processor implementing the steps of a method for dynamic segmentation of computational power as claimed in any one of claims 1 to 7 when the computer program is executed by the processor.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the steps of a method for dynamic segmentation of computational forces according to any one of claims 1-7.
CN202510962337.9A 2025-07-14 2025-07-14 A method, system, computer device and storage medium for dynamic computing power division Pending CN120469816A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510962337.9A CN120469816A (en) 2025-07-14 2025-07-14 A method, system, computer device and storage medium for dynamic computing power division

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510962337.9A CN120469816A (en) 2025-07-14 2025-07-14 A method, system, computer device and storage medium for dynamic computing power division

Publications (1)

Publication Number Publication Date
CN120469816A true CN120469816A (en) 2025-08-12

Family

ID=96629000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510962337.9A Pending CN120469816A (en) 2025-07-14 2025-07-14 A method, system, computer device and storage medium for dynamic computing power division

Country Status (1)

Country Link
CN (1) CN120469816A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120723050A (en) * 2025-09-01 2025-09-30 苏州元脑智能科技有限公司 Power loop switching control method, electronic device, computer storage medium and program product
CN120848966A (en) * 2025-09-19 2025-10-28 深圳市众鸿科技股份有限公司 Vehicle-mounted multi-system dynamic switching method, system, device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116204308A (en) * 2023-02-16 2023-06-02 北京贝思科技术有限公司 Dynamic adjusting method and device for audio and video computing power and electronic equipment
CN117032974A (en) * 2023-08-17 2023-11-10 福建万福信息技术有限公司 Dynamic scheduling method and terminal based on resource application
CN117608850A (en) * 2023-12-01 2024-02-27 西北工业大学 A multi-task computing resource allocation method and device for neural network processors
CN117611425A (en) * 2024-01-17 2024-02-27 之江实验室 Graphics processor computing power configuration method, device, computer equipment and storage medium
CN118550711A (en) * 2024-07-29 2024-08-27 广脉科技股份有限公司 Method and system for improving calculation efficiency
CN120196421A (en) * 2025-05-26 2025-06-24 瑞石数据科技(深圳)有限公司 A method and device for GPU resource virtualization computing power scheduling
CN120276861A (en) * 2025-04-07 2025-07-08 云聚数据科技(上海)有限公司 Computing power sharing system and method in multi-tenant environment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116204308A (en) * 2023-02-16 2023-06-02 北京贝思科技术有限公司 Dynamic adjusting method and device for audio and video computing power and electronic equipment
CN117032974A (en) * 2023-08-17 2023-11-10 福建万福信息技术有限公司 Dynamic scheduling method and terminal based on resource application
CN117608850A (en) * 2023-12-01 2024-02-27 西北工业大学 A multi-task computing resource allocation method and device for neural network processors
CN117611425A (en) * 2024-01-17 2024-02-27 之江实验室 Graphics processor computing power configuration method, device, computer equipment and storage medium
CN118550711A (en) * 2024-07-29 2024-08-27 广脉科技股份有限公司 Method and system for improving calculation efficiency
CN120276861A (en) * 2025-04-07 2025-07-08 云聚数据科技(上海)有限公司 Computing power sharing system and method in multi-tenant environment
CN120196421A (en) * 2025-05-26 2025-06-24 瑞石数据科技(深圳)有限公司 A method and device for GPU resource virtualization computing power scheduling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
罗蕾: "《嵌入式实时操作系统及应用开发(第二版)》", 31 March 2007, 北京航空航天大学出版社, pages: 188 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120723050A (en) * 2025-09-01 2025-09-30 苏州元脑智能科技有限公司 Power loop switching control method, electronic device, computer storage medium and program product
CN120848966A (en) * 2025-09-19 2025-10-28 深圳市众鸿科技股份有限公司 Vehicle-mounted multi-system dynamic switching method, system, device and storage medium

Similar Documents

Publication Publication Date Title
EP3847549B1 (en) Minimizing impact of migrating virtual services
KR20220006490A (en) Hybrid cloud resource allocation method for workload dynamic resource placement and optimization performance management
US11720408B2 (en) Method and system for assigning a virtual machine in virtual GPU enabled systems
US9081621B2 (en) Efficient input/output-aware multi-processor virtual machine scheduling
Grandl et al. Multi-resource packing for cluster schedulers
US9672064B2 (en) Dynamically adaptive, resource aware system and method for scheduling
Hui et al. Improved strategies for dynamic load balancing
US5745778A (en) Apparatus and method for improved CPU affinity in a multiprocessor system
Tang et al. Fault-aware, utility-based job scheduling on blue, gene/p systems
CN120469816A (en) A method, system, computer device and storage medium for dynamic computing power division
US8219995B2 (en) Capturing hardware statistics for partitions to enable dispatching and scheduling efficiency
WO2012028214A1 (en) High-throughput computing in a hybrid computing environment
CN102156665A (en) Differential serving method for virtual system competition resources
CN103064746A (en) Processor resource accurate distributing method for predictive scheduling based on current credit
Pongsakorn et al. Container rebalancing: Towards proactive linux containers placement optimization in a data center
CN110543355A (en) method for automatically balancing cloud platform resources
CN118312320A (en) Memory management method, system, desktop computer and computer storage medium
US8020164B2 (en) System for determining and reporting benefits of borrowed computing resources in a partitioned environment
JP2011253334A (en) Virtual machine and cpu allocation method
KR20230058764A (en) Method for managing resources dynamically, and apparatus implementing the same method
CN120295722A (en) A dynamic task scheduling and allocation method based on heterogeneous computing resources
CN120704901B (en) Resource allocation methods, devices, electronic equipment and storage media
Zhang et al. COBRA: Toward provably efficient semi-clairvoyant scheduling in data analytics systems
Liu et al. HCoop: A Cooperative and Hybrid Resource Scheduling for Heterogeneous Jobs in Clouds
CN120596280B (en) GPU scheduling system, method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination