Disclosure of Invention
Therefore, the embodiment of the application provides a dynamic computing force segmentation method, a system, computer equipment and a storage medium, which can solve the technical problems of low resource utilization rate and long waiting time in computing force segmentation, and the specific technical scheme comprises the following steps:
in a first aspect, an embodiment of the present application provides a method for dynamically splitting a computing force, where the method includes:
collecting historical load data of a system and a task execution mode, and predicting future resource use conditions by adopting a machine learning algorithm model;
Monitoring system overall load data;
Computing resources available in the search system;
Triggering a calculation segmentation decision if the overall load data of the system exceeds an overall load threshold, the resource demand of a single task exceeds a resource shortage threshold and/or the future resource use condition exceeds the system computing capacity;
Matching the compatibility of the available computing resources and the resource parameters of the task to be executed to obtain a compatibility result;
Formulating a segmentation strategy according to the compatibility result, available computing resources and resource parameters of the task to be executed;
and reallocating the available computing resources according to the segmentation strategy.
Preferably, the method further comprises:
Setting a task ordering rule, wherein the task ordering rule comprises a task priority strategy and/or a time slice rotation strategy;
maintaining a task queue according to a task ordering rule, wherein the task queue stores tasks to be executed and resource parameters of the tasks to be executed;
according to the compatibility result, the available computing resources and the resource parameters of the task to be executed, a segmentation strategy is formulated as follows:
and formulating a segmentation strategy according to the compatibility result, the task queue and the available computing resources.
Preferably, the method further comprises:
setting a task ordering rule, wherein the task ordering rule comprises a priority strategy;
Acquiring the priority of the executing task by a priority policy;
releasing the computing resources of the executing task having a lower priority than the preset priority low value as available computing resources and/or releasing the computing resources of the executing task having a lower priority than the task to be executed as available computing resources.
Preferably, the method further comprises:
when a first task of the executing tasks is interrupted;
Storing the execution state of the first task;
releasing the computing resources of the first task as available computing resources;
and re-executing the first task until the interrupt disappears.
Preferably, before reallocating the available computing resources according to the slicing policy, the method further comprises:
And predicting the running condition of the system computing resources after segmentation according to the segmentation strategy by using a modeling tool or a sandbox environment.
Preferably, the method further comprises:
performing a verification test, the verification test including a benchmark test, a stress test, and/or a stability test;
If the verification test result is a negative effect, rolling back to the previous system computing resource allocation configuration;
And if the verification test result does not reach the expected value, adjusting the segmentation strategy or optimizing the execution mode of the task.
Preferably, the available computing resources in the search system include:
detecting system computing resources;
detecting the execution progress and performance index of a task being executed in the system;
And acquiring available computing resources according to the data of the system computing resources occupied by the executing task.
In a second aspect, embodiments of the present application provide a dynamic slicing system, the system comprising:
The resource and task monitoring module is used for collecting historical load data of the system and task execution modes, and predicting future resource use conditions by adopting a machine learning algorithm model;
the load evaluation and demand analysis module is also used for monitoring the overall load data of the system;
The adjustable resource query module is used for searching available computing resources in the system;
The judging module is used for triggering a calculation force segmentation decision if the overall load data of the system exceeds a preset load threshold, the single task resource requirement exceeds a preset task threshold and/or the future resource use condition exceeds the system computing capacity;
the resource compatibility checking module is used for matching the compatibility of the available computing resources and the resource parameters of the tasks to be executed and obtaining a compatibility result;
the strategy making module is used for making a segmentation strategy according to the compatibility result, the available computing resources and the resource parameters of the task to be executed;
and the execution module is used for reallocating the available computing resources according to the segmentation strategy.
In a third aspect, embodiments of the present application provide a computer device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the steps of a method for dynamic segmentation of computational power of any one of the above when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, which when executed by a processor, implements the steps of a method for dynamically splitting a computing force of any of the above.
In summary, compared with the prior art, the technical scheme provided by the embodiment of the application has the following beneficial effects:
1. Continuously monitoring the resource use and load trend of each calculation task, predicting the future resources through a machine learning algorithm model, finding out the calculation force adjustment requirement in time, setting a segmentation strategy by setting a threshold value and combining the matching characteristics of the tasks and the resources, and the like, ensuring compatibility and accessibility, realizing full utilization of the resources, shortening waiting time and realizing reasonable calculation force distribution;
2. The task ordering rules may be either or a combination of a priority policy and a time slice rotation policy. The method is convenient for sequencing and/or scheduling the occupied resource time of the tasks according to the characteristics of the emergency degree, the real-time requirement and the like of the tasks, and reasonably utilizes the resources.
Detailed Description
The present embodiment is only for explanation of the present application and is not to be construed as limiting the present application, and modifications to the present embodiment, which may not creatively contribute to the present application as required by those skilled in the art after reading the present specification, are all protected by patent laws within the scope of claims of the present application.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In addition, the term "and/or" in the present application is merely an association relation describing the association object, and indicates that three kinds of relations may exist, for example, a and/or B may indicate that a exists alone, and a and B exist together, and B exists alone. In the present application, unless otherwise specified, the term "/" generally indicates that the associated object is an "or" relationship.
The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and "n," and that there is no limitation on the amount and order of execution.
The term "at least one" in the present application means one or more, and "a plurality" means three or more, for example, a plurality of first positions means three or more first positions.
Embodiments of the application are described in further detail below with reference to the drawings.
Referring to fig. 1, in one embodiment of the present application, a dynamic slicing method is provided, and the main steps of the method are described as follows:
S1, collecting historical load data of a system and a task execution mode, and predicting future resource use conditions by adopting a machine learning algorithm model;
S2, monitoring overall load data of the system;
s3, searching available computing resources in the system;
s4, triggering a calculation force segmentation decision if the overall load data of the system exceeds a preset load threshold, the single task resource requirement exceeds a preset task threshold and/or the future resource use condition exceeds the system computing capacity;
s5, matching the compatibility of the available computing resources and the resource parameters of the task to be executed, and obtaining a compatibility result;
S6, formulating a segmentation strategy according to the compatibility result, the available computing resources and the resource parameters of the task to be executed;
And S8, reallocating the available computing resources according to the segmentation strategy.
In this embodiment, in order to fully explain the present embodiment, a software and hardware coordination example capable of implementing the method of the present embodiment is provided as follows, and in other embodiments of the present application, existing software and hardware coordination examples capable of implementing the same function may also be used, which will not be described herein in detail, by deep integration with Kubernetes (K8 s), efficient scheduling of GPU resources is implemented. By means of a Device plug in (NVIDIA) mechanism, K8s can accurately identify and allocate GPU resources. The user can realize the efficient utilization of resources by only easily designating the GPU equipment and the video memory requirement in the Pod definition. Kubernetes provides a solid guarantee in GPU support for containerized applications by virtue of its excellent extensibility. The MIG technology is used for realizing multi-instantiation of the GPU, the GPU can be finely divided into a plurality of mutually independent computing units, and each application can enjoy fine-granularity resource isolation and can realize resource sharing, so that efficient and safe operation under a virtualized environment is ensured. And by means of Horizontal Pod Autoscaler of Kubernetes or an independently developed autoscaler plug-in, the resources are intelligently adjusted according to the load condition of the GPU, and the automation mechanism can automatically increase or decrease the GPU resources.
In this embodiment, first, a computing system including a GPU card is turned on, and a related driver and management software are loaded, where in this embodiment, the driver includes:
Resource management module
1) Hardware resource awareness
Is responsible for detecting and identifying various computing resources in the system, including CPU cores, GPU cores, memory modules, storage devices, and the like. For example, for a system with multiple CPU cores, the driver needs to know exactly the model, frequency, cache size, etc. parameters of each core. When identifying the GPU, determining information such as the model of the GPU, the video memory capacity, the video memory bandwidth, the number of CUDA cores (for NVIDIA GPUs) or stream processors (for AMD GPUs), and the like.
This information is obtained by communicating with the underlying interface of the computer hardware. Taking the Linux system as an example, the driver may collect detailed data of the hardware resources by using related entries in the/proc and/sys file systems or by calling the BIOS/UEFI interface provided by the hardware vendor.
Through the operation, the information of the available computing resources in the system is searched, and the task matching and compatibility matching can be conveniently carried out subsequently.
2) Resource allocation policy
Based on the demands of the application program and the overall state of the system, the resource management module acquires a calculation segmentation strategy. For example, when multiple applications request computing resources at the same time, the driver may employ a priority policy. For application programs with high requirements on real-time performance, such as real-time video stream processing software, high priority is allocated to ensure that enough computing power can be obtained preferentially.
The time slice rotation strategy can also be adopted to distribute the computing power resources among different application programs according to a certain time interval. For example, the usage rights of the CPU core are switched from one application to another at regular intervals (e.g., 10 milliseconds) to ensure that each application gets a certain computation time.
3) Resource monitoring and adjustment
It is important to continuously monitor the use of computational resources. The driver will track the occupancy rate of each application to the CPU, GPU, etc. resources in real time, for example by reading performance counters provided by the operating system or monitoring registers of the hardware itself.
The driver may automatically adjust when it finds that an application is over-occupied or under-utilized. For example, if one data processing application occupies a large amount of GPU resources for a long time, resulting in the other graphics rendering applications being stuck, the driver may properly reduce the GPU resource allocation of that data processing application to ensure proper operation of the graphics rendering application.
Task scheduling module
1) Task queue management
The driver needs to maintain a task queue for storing computing tasks waiting to be executed. These tasks may come from different applications, each containing information about the type of task (e.g., CPU-intensive, GPU-intensive), priority, amount of resources needed, etc. When a new task enters the queue, the driver inserts the task into the appropriate location according to its priority and resource requirements. For example, for a high priority deep learning training task, the driver may rank it ahead of the task queue to allocate computing resources to it as soon as possible.
2) Task allocation and execution
The task scheduling module distributes tasks to the appropriate computing units for execution according to the availability of resources and the needs of the tasks. For a CPU-GPU heterogeneous computing system, it will determine whether the task is suitable for execution on the CPU or on the GPU. For example, for highly parallelized computational tasks such as matrix multiplication, the driver may allocate them to execution on the GPU because the GPU has a large number of parallel computational units and can complete the computation faster.
During task execution, the driver also needs to handle interrupts and resumptions of tasks. If an interrupt event (e.g., hardware failure, high priority task insertion, etc.) occurs to the system, the driver needs to pause the task currently being executed, save the execution state of the task, and then reallocate the computing resources to a new task or handle the interrupt event. After the interrupt event processing is completed, the driver may resume execution of the suspended task.
Interface module
1) Application Program Interface (API)
APIs include function calls, system calls, or object-oriented programming interfaces. For example, an application may request a certain amount of CPU core or GPU memory by calling an API function provided by a driver.
These APIs may also be used for applications to pass information about tasks to drivers, such as task priorities, predicted execution times, etc. Meanwhile, the driver may feed back the result of resource allocation to the application program through the API, for example, whether the required resource is successfully allocated or not, and specific information (for example, the allocated CPU core number, GPU device ID, etc.) of the allocated resource.
2) Operating system interface
These interfaces are used to receive scheduling instructions for the operating system, report resource usage to the operating system, and so on. For example, when an operating system needs to pause an application to free up resources, it will send instructions to the driver via the driver's operating system interface.
The driver may also obtain global information of the system, such as a load condition of the system, a state of other devices, and the like, through an operating system interface. This facilitates the driver to better allocate resources and schedule tasks to accommodate the overall operating environment of the system.
In this embodiment, virtual GPUs are segmented using NVIDIA vGPU software, which allows the computing power of a physical GPU to be split into multiple virtual GPUs, each of which can be assigned to a different virtual machine or container for use. For example, in a data center environment, multiple users may share one physical GPU through vGPU technology while running graphics intensive applications or deep learning model training. It supports multiple GPU splitting modes, such as fixed allocation (each virtual GPU allocates a fixed number of CUDA cores, memory, etc. resources) and dynamic allocation (dynamically adjusting the resources of the virtual GPUs according to the needs of the application). By using HAMi open source segmentation software to support a plurality of calculation force segmentation modes such as MIG, MPS and the like, the calculation force can be finely segmented to 1%, and the video memory segmentation is in megabits. The method is fully suitable for domestic/non-domestic computing power, and can perform mixed deployment, unified scheduling and management on different types of computing power clusters such as Injeida, huazheng, sea light, chilly, day intelligent core, bath light, molar threads and the like. The one-card multi-purpose is realized through a virtualization technology, and the hardware utilization rate is improved.
The real-time monitoring nvidia GPU is instructed by nvidia-smi and rocmon under the Linux system, and the real-time collection of various parameters of the GPU card by software can be realized by utilizing interface feedback parameter information, including but not limited to the information such as the utilization rate of the video memory, the core frequency, the temperature, the utilization rate of each computing unit and the like.
In this embodiment, the overall load data is weighted by comprehensively considering the memory and the usage of the computational core. In other embodiments of the application, other parameters may be used, such as any one, two or more of the following in combination:
GPU usage (rendering), which refers to how busy a graphics processing unit is during a particular period of time, is typically expressed in percent. 100% means that the GPU is running at full load.
Memory Usage (Memory Usage) refers to the proportion of Memory used, also typically expressed in percent, or directly as a ratio of used Memory to total Memory.
Temperature (Temperature) the operating Temperature of the graphics card is also an important indicator of its load. Excessive temperatures may cause performance degradation or automatically lower frequencies to protect hardware.
Fan Speed (Fan Speed), which may reflect the effort made by the graphics card to maintain the proper temperature, is typically expressed in terms of Revolutions Per Minute (RPM) or percent relative to maximum Speed.
Power consumption Power Consumption the power currently consumed by the graphics card is another key indicator that can help to understand the operating strength of the graphics card. The power consumption is typically measured in watts (W).
Clock frequency (Clock Speed), which includes the core frequency and the video memory frequency, reflects the current running Speed of the video card. Dynamic adjustment of frequency is a common way of managing performance and power consumption of modern graphics cards.
API call rate (API CALL RATE) the indicators of drawing call times per Second (DRAW CALLS PER seconds) or frame generation time can reflect the load condition of the graphics card for some applications, especially games.
The rendering delay (RENDER LATENCY) from the time the rendering command is submitted to the time the rendering is actually completed may also be an indicator of the load.
Other custom indexes, such as throughput, inference time, etc. in machine learning task, may also have other load related indexes according to different application scenarios.
In this embodiment, collecting historical data includes, but is not limited to, collecting performance monitoring data over a period of time, including key metrics such as CPU, GPU, memory, network bandwidth, etc. Workload logs are collected, and workload conditions of different time periods, such as task submission frequency, task type, execution time and the like, are recorded.
Task execution mode-similar workloads are grouped using clustering algorithms to better understand the resource requirements of different types of tasks.
And then, adopting the existing machine learning algorithm model to carry out predictive analysis, and predicting the load change condition in a future period according to the historical load data and the task execution mode by combining the machine learning algorithm. If an impending resource bottleneck is predicted, i.e. future resource usage exceeds the system computing power, a decision may be made to make a computationally intensive segmentation decision even if the overall load threshold is not currently exceeded.
3) Evaluating existing resources
And (3) analyzing the utilization rate of the resources, namely evaluating the utilization rate of the existing resources, and finding out the unused or over-used resources, namely the available computing resources. Bottlenecks in the system, such as CPU/GPU bottlenecks, memory bottlenecks, network bottlenecks, etc., are identified and analyzed for impact on overall performance.
Threshold comparison, namely setting a plurality of thresholds including a GPU overall load threshold (such as the utilization rate of the video memory exceeds 80% and the utilization rate of the computing core exceeds 70%) and a resource shortage threshold of a single task (such as the video memory requirement of a certain task grows to exceed a certain proportion in a short time and approaches the upper limit of current allocation), so that when in use, an upper resource limit is set for each task or user group, and excessive resources occupied by the single task are avoided. When the monitored data exceeds these thresholds, a power slicing decision is triggered.
In this embodiment, when the overall load data of the system exceeds the overall load threshold, the single task resource requirement exceeds the resource shortage threshold, and the future resource usage exceeds any one of the system computing capabilities, that is, the computing power splitting decision is triggered, in other embodiments of the present application, any two or three of the triggering modes may be adopted.
The available GPU resources, i.e., available computing resources, in the search system are then searched, including other idle GPU cards, underutilized portions of the current GPU card (e.g., some computing units are idle), or computing power resources that may be temporarily reclaimed from the low priority task. And then, carrying out resource compatibility checking to ensure that the resources to be allocated are compatible with the currently running task and system environment, for example, checking whether GPU architecture, driving version, video memory type and the like are matched or not to obtain a compatibility matching result of the task and available computing resources, namely, a compatibility result, and grouping similar workloads by using a clustering algorithm in the embodiment so as to better understand the resource requirements of different types of tasks.
Finally, based on task characteristics, a segmentation strategy is formulated according to compatibility results, available computing resources and resource parameters of the task to be executed. The resource parameters of the task to be executed are GPU architecture, driving version, video memory type and the like required by the task execution. For the key tasks with high priority (such as online deep learning reasoning service), the calculation power demand is guaranteed preferentially, and for the tasks with low real-time requirements (such as background model pre-training tasks), the calculation power distribution can be adjusted appropriately. When the calculation force is split, load balance among the GPU cards is kept as much as possible, and the situation that some GPU cards are used excessively and other GPU cards are idle is avoided. Tasks and computing forces can be reasonably distributed according to the performance difference of the GPU cards (such as different computing forces of GPUs with different models). In other embodiments of the present application, the slicing strategy may be formulated according to other parameters that need to be considered.
And dynamically adjusting the computing power of the GPU card through an interface provided by GPU management software or a driver. This may include reallocating the size of the memory, adjusting the number of allocations of computing cores, changing the execution queues of tasks, and so on.
According to the application, the resource use and the load trend of each calculation task are continuously monitored, the missing resources are predicted through the machine learning algorithm model, the calculation force adjustment requirement is found in time, the segmentation strategy is formulated by setting the threshold value and combining the matching characteristics of the tasks and the resources, the compatibility and the accessibility are ensured, the full utilization of the resources is realized, the waiting time is shortened, and the calculation force distribution is realized.
Referring to fig. 2, in another embodiment of the present application, further comprising:
s61, setting task ordering rules, wherein the task ordering rules comprise a task priority strategy and/or a time slice rotation strategy;
s62, maintaining a task queue according to a task ordering rule, wherein the task queue stores tasks to be executed and resource parameters of the tasks to be executed;
and S6, formulating a segmentation strategy according to the compatibility result, the available computing resources and the resource parameters of the task to be executed, wherein the segmentation strategy is as follows:
and S63, formulating a segmentation strategy according to the compatibility result, the task queue and the available computing resources.
In this embodiment, the task ordering rule may be either one or a combination of the priority policy and the time slice rotation policy. If the task ordering rule is set as a priority strategy, the task ordering rule is convenient to order according to the characteristics of the task such as the emergency degree, the real-time requirement and the like, and resources are reasonably utilized.
Referring to fig. 3, in another embodiment of the present application, further comprising:
s301, setting a task ordering rule, wherein the task ordering rule comprises a priority strategy;
s302, acquiring the priority of the task being executed by a priority policy;
S303, releasing the computing resources of the executing task with the priority lower than the preset priority low value as available computing resources and/or releasing the computing resources of the executing task with the priority lower than the task to be executed as available computing resources.
In this embodiment, the priority is set for the task, and when the overall load data of the system exceeds the overall load threshold, the resource of the task with low priority is occupied, so that the task with high priority can be executed first, and delay or operation discomfort generated when the user end uses the system is reduced. When the resources of the low priority task are occupied, the low priority task reenters the ordering. In other examples of this embodiment, a low priority task that is occupied with computing resources may also wait for the task that is occupied with its computing resources to complete the computation, then take over the original computing resources,
In this embodiment, the preset low priority value is the priority level preset before the task with low priority occupies the resource release, for example, when the task is required to be split with fine computing power, the preset low priority value is 3, so that the resources occupied by all the tasks with 3 priority are released, in other examples of this embodiment, the priority level of the task with the priority lower than the current threshold exceeding the resource tension is the preset priority, so as to realize the dynamic scheduling of the resources, release more tasks with low real-time requirements, and provide computing resources for the tasks with high priority.
Referring to fig. 4, in another embodiment of the present application, further comprising:
s311 when a first task among the tasks being executed is interrupted;
s312, saving the execution state of the first task;
s313, releasing the computing resources of the first task as available computing resources;
and S314, re-executing the first task until the interrupt disappears.
In the present embodiment, the first task is any type of task being executed, and is not limited herein.
When the first task is interrupted, the resources occupied by the first task are released, so that the resources occupied by the first task can be fully utilized, and the flexibility of resource application is improved.
Referring to fig. 5, in another embodiment of the present application, before step S8, the method further includes:
and S7, predicting the running condition of the system computing resources after segmentation according to the segmentation strategy by using a modeling tool or a sandbox environment.
In this embodiment, the modeling tool or the sandbox environment is used to predict the adjusted impact, evaluate possible risks and benefits, and reduce the adverse impact caused by the running of the segmentation strategy.
Referring to fig. 6, in another embodiment of the present application, further comprising:
S9, performing verification tests, wherein the verification tests comprise a benchmark test, a pressure test and/or a stability test;
s10, if the verification test result is a negative effect, rolling back to the previous system computing resource allocation configuration;
And S11, if the verification test result does not reach the expected value, adjusting the segmentation strategy or optimizing the execution mode of the task.
In this embodiment, the validation test includes any one or more of a benchmark test, a stress test, and a stability test.
Benchmarking a series of benchmarks are run, covering daily operations and extreme cases, to evaluate the effect of the new configuration.
And (3) pressure test, namely applying high load to the system, checking the performance of the system under the limit condition, and ensuring the robustness of the system.
And (3) stability test, namely running the system for a long time and observing whether performance degradation or other abnormal conditions occur.
Comparing the performance data under the new configuration with the previous baseline data to confirm whether the expected optimization objective is reached. And further adjusting the configuration according to the test result, and continuously optimizing until all the targets are met. If a new configuration is found to have a negative effect, the configuration is immediately rolled back to the previous configuration and the adjustment scheme is re-evaluated. If the performance of a task is found to be unexpected, its computational power allocation may be adjusted appropriately or the execution of the task on the GPU card may be optimized.
The specific fine tuning steps can be divided into:
1) Preparation and planning
The optimization targets are determined, namely specific targets which are expected to be achieved through fine tuning are determined, such as performance improvement of specific tasks, overall cost reduction, resource utilization improvement and the like.
Selecting tools-selecting appropriate monitoring and management tools, such as NVIDIA SYSTEM MANAGEMENT INTERFACE (nvidia-smi), prometheus, grafana, etc., for collecting and analyzing data.
2) Presence assessment
Baseline data were collected:
The usage monitoring tool records the current resource usage (CPU, GPU, memory, network, etc.).
Key Performance Indicators (KPIs) such as response time, throughput, error rate, etc. are recorded.
Identifying a bottleneck:
The existing data is analyzed to find out the place or performance bottleneck where the resource allocation is unreasonable.
It is determined which tasks or services have resource contention during peak hours.
3) Making an adjustment scheme
Defining adjustment parameters:
Task priority, namely setting priorities for different tasks according to service importance.
Setting a resource upper limit for each task or user group, and avoiding the occupation of excessive resources by a single task.
And (3) dynamic scheduling, namely starting or optimizing the existing dynamic scheduling algorithm, and automatically adjusting resource allocation according to the real-time load.
The isolation mechanism is to consider using containerization (Docker, kubernetes) or virtual machine technology to isolate tasks, so as to ensure more flexible and controllable resource allocation.
Simulation effect:
the modeling tools or sandbox environment are used to predict the impact after adjustment, assessing possible risk and benefit.
4) Implementing the adjustment
Small range test points:
The new configuration is tested in advance in a non-production environment, ensuring its stability and effectiveness.
Data during the pilot points are collected to compare performance changes before and after adjustment.
Gradually popularizing:
And gradually popularizing new configuration in the production environment according to the test point result, starting from the low risk area.
After each popularization, the system performance is closely monitored, and no new problem is introduced.
5) Verification effect
Benchmark test:
a series of benchmarks are run, covering daily operations and extremes, to evaluate the effect of the new configuration.
And (3) pressure test:
high loads are applied to the system, and the performance of the system under the limit condition is checked, so that the robustness of the system is ensured.
Stability test:
And (3) running the system for a long time, and observing whether performance degradation or other abnormal conditions occur.
6) Analysis and iteration
Comparison analysis:
comparing the performance data under the new configuration with the previous baseline data to confirm whether the expected optimization objective is reached.
Continuous improvement:
And further adjusting the configuration according to the test result, and continuously optimizing until all the targets are met.
If a new configuration is found to have a negative effect, the configuration is immediately rolled back to the previous configuration and the adjustment scheme is re-evaluated.
If the performance of a task is found to be unexpected, its computational power allocation may be adjusted appropriately or the execution of the task on the GPU card may be optimized.
Through the setting of the embodiment, the characteristics of practicality, stability and the like of the updated system calculation force segmentation strategy are evaluated, the fault condition of the system is reduced, and the reliability of the system is improved.
Referring to fig. 7, in another embodiment of the present application, step S3 includes:
s31, detecting system computing resources;
s32, detecting the execution progress and performance index of the task being executed in the system;
s33, acquiring available computing resources according to the data of the system computing resources occupied by the executing task.
By the arrangement of the implementation, the available computing resources of the system can be accurately detected. The system computing resources include CPU cores, GPU cores, memory modules, storage devices, and the like. For a system with multiple CPU cores, the driver needs to know exactly the model, frequency, cache size, etc. parameters of each core.
In another embodiment of the present application, further comprising:
notifying an administrator and recording an operation log;
The system is continuously monitored to calculate the resource running condition.
In the embodiment, the administrator is notified of the execution condition of the calculation segmentation by means of mail, system information and the like, wherein the execution condition comprises information such as the segmentation reason, the adjustment content, the influence on task performance and the like. And recording an operation log, namely recording related data of the whole calculation force segmentation process into a log file, wherein the related data comprise monitoring data, segmentation decision basis, executed operation and the like, so that subsequent audit and analysis can be realized. And continuously monitoring, namely returning to the step of monitoring the resources and the tasks, and continuously monitoring the computing power use condition and the task load of the GPU card in real time to form a closed-loop dynamic management flow so as to timely carry out the next computing power segmentation adjustment according to the change of the system.
In another embodiment of the present application, the method further comprises:
and synchronizing and migrating the data.
In this embodiment, if the slicing involves movement of data between different GPU memory regions, accurate synchronization and migration of data is ensured. For example, in distributed deep learning training, model parameters and data may need to be redistributed among different GPU cards.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
In one embodiment of the present application, a dynamic computing force splitting system is provided, which corresponds to one dynamic computing force splitting method in the above embodiment. The dynamic segmentation system of the computing power comprises:
The resource and task monitoring module is used for collecting historical load data of the system and task execution modes, and predicting future resource use conditions by adopting a machine learning algorithm model;
the load evaluation and demand analysis module is also used for monitoring the overall load data of the system;
The adjustable resource query module is used for searching available computing resources in the system;
The judging module is used for triggering a calculation force segmentation decision if the overall load data of the system exceeds a preset load threshold, the single task resource requirement exceeds a preset task threshold and/or the future resource use condition exceeds the system computing capacity;
the resource compatibility checking module is used for matching the compatibility of the available computing resources and the resource parameters of the tasks to be executed and obtaining a compatibility result;
the strategy making module is used for making a segmentation strategy according to the compatibility result, the available computing resources and the resource parameters of the task to be executed;
and the execution module is used for reallocating the available computing resources according to the segmentation strategy.
The above-mentioned dynamic dividing system of calculation power can be realized by all or part of software, hardware and their combination. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment of the present application, a computer device is provided, which may be a server. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The Memory of the computer device may be implemented by any type of volatile or nonvolatile Memory device, including, but not limited to, magnetic disks, optical disks, EEPROMs (Electrically erasable programmable Read-Only Memory), EPROMs (Erasable Programmable Read Only Memory, erasable programmable Read-Only Memory), SRAMs (Static Random Access Memory ), ROMs (Read-Only Memory), magnetic memories, flash memories, PROMs (Programmable Read-Only Memory). The memory of the computer device provides an environment for the running of an operating system and computer programs stored therein. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements the steps of a dynamic segmentation method of the above embodiment.
In one embodiment of the present application, a computer readable storage medium is provided, the computer readable storage medium storing a computer program which, when executed by a processor, implements a dynamic slicing method step of the above embodiment. The computer readable storage medium includes ROM (Read-Only Memory), RAM (Random-Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic disk, floppy disk, and the like.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus of the present application is divided into different functional units or modules to perform all or part of the above-described functions.