CN121092303A

CN121092303A - GPU scheduling methods, devices, systems, and storage media for autonomous driving

Info

Publication number: CN121092303A
Application number: CN202511129406.4A
Authority: CN
Inventors: 扶元地; 涂超平; 冯荻
Original assignee: Mushroom Car Union Information Technology Co Ltd
Current assignee: Mushroom Car Union Information Technology Co Ltd
Priority date: 2025-08-13
Filing date: 2025-08-13
Publication date: 2025-12-09

Abstract

The embodiment of the application provides an automatic driving GPU scheduling method, an automatic driving GPU scheduling device, an automatic driving GPU scheduling system and a storage medium, and relates to the technical field of resource scheduling. The method comprises the steps of generating at least two candidate scheduling policy combinations according to registration information of each task module of a local node and service level scheduling rules acquired from a configuration management module when preset trigger conditions are met, testing each candidate scheduling policy combination through each task module registered by the local node, determining a current optimal scheduling policy combination of the local node, storing the current optimal scheduling policy combination of the local node into the configuration management module, receiving resource scheduling requests of each task module of the local node, and performing GPU resource allocation on the resource scheduling requests of each task module of the local node according to the current optimal scheduling policy combination of the local node. The task module is combined with test feedback for multiple candidate strategies, so that the resource requirements under different load scenes can be dynamically adapted, and the stability and instantaneity of the system are enhanced.

Description

Automatic driving GPU scheduling method, device, system and storage medium

Technical Field

The application relates to the technical field of resource scheduling, in particular to an automatic driving GPU scheduling method, an automatic driving GPU scheduling device, an automatic driving GPU scheduling system and a storage medium.

Background

At present, with the rapid development of artificial intelligence technology, an automatic driving system is highly dependent on high computational power support provided by parallel computing devices such as a GPU. These systems are typically built on embedded heterogeneous platforms, which require a balance between cost, power consumption and performance. However, AI model complexity improvement (e.g., large model applications) results in relatively shortage of computational resources, while autopilot requires extremely high system stability and real-time performance. Therefore, in a resource-constrained environment, how to improve the GPU resource utilization rate and reduce subsystem delay through efficient scheduling becomes a key challenge for guaranteeing the system performance.

To solve the above problems, various graphic processor resource management schemes have been proposed in the related art. For example, the resource pooling technology of the graphic processor realizes the sharing of the resources by carrying out fine granularity division on the resources, the multi-card cooperative work mode improves the utilization efficiency of the whole computing power by means of a distributed computing architecture, and the independent use requirements of different tasks are met by dividing the resources by the resource sharing and isolation technology of the graphic processor based on the container.

In the process of implementing the embodiment of the application, the related art is found to have at least the following problems:

the utilization rate of the resources of the graphic processor is improved to a certain extent by optimizing the resource allocation and sharing modes by adopting the related technology. However, the related art is mainly oriented to the resource sharing requirement of a general scene, is difficult to adapt to the scene specificity of an automatic driving system and the difference of all subsystems, and cannot effectively ensure the stability and instantaneity of the system when the graphics processor is in a high-load state.

Disclosure of Invention

The embodiment of the application provides an automatic driving GPU dispatching method, an automatic driving GPU dispatching device, an automatic driving GPU dispatching system and a storage medium.

In a first aspect of the embodiment of the application, an automatic driving GPU dispatching method is provided, which is applied to a dispatching center, and comprises the following steps:

when the preset triggering condition is met, generating at least two candidate scheduling strategy combinations according to the registration information of each task module of the local node and the service level scheduling rules acquired from the configuration management module;

testing each candidate scheduling policy combination through each task module registered by the local node, determining the current optimal scheduling policy combination of the local node, and storing the current optimal scheduling policy combination of the local node into a configuration management module;

and receiving the resource scheduling requests of each task module of the local node, and performing GPU resource allocation on the resource scheduling requests of each task module of the local node according to the current optimal scheduling policy combination of the local node.

In an alternative embodiment of the present application, the GPU scheduling method for autopilot further includes:

Acquiring cluster node information of a dispatching center from a configuration management module, and acquiring a load state of a local node;

synchronizing the load states of the local nodes to other dispatching centers in the dispatching center cluster and receiving the load states of the nodes where the other dispatching centers are located;

when the load state of the local node meets a preset transfer condition, transferring at least part of resource scheduling requests to scheduling centers of other nodes meeting the preset load condition;

and receiving resource scheduling requests transferred by the scheduling centers of other nodes, performing GPU resource allocation on the resource scheduling requests transferred by the scheduling centers of other nodes according to the current optimal scheduling policy combination of the local node, and returning an execution result to an original task module of the other nodes.

In an optional embodiment of the present application, performing GPU resource allocation on resource scheduling requests of each task module of the local node according to a current optimal scheduling policy combination of the local node includes:

queuing resource scheduling requests of each task module of the local node according to the registration information of each task module of the local node and the current optimal scheduling policy combination of the local node;

Granting GPU resource execution permission to a task module corresponding to the resource scheduling request meeting the execution conditions of the local node so as to execute an actual computing task;

and after the actual computing task of each task module of the local node is executed, the GPU resource executing authority of each task module of the local node is recovered.

In a second aspect of the embodiment of the present application, another method for dispatching a GPU for automatic driving is provided, which is applied to a task dispatching module, and the method includes:

Starting a client and registering with a dispatching center of a local node, wherein registration information comprises explicit dispatching requirements and GPU execution unit information;

Receiving a test instruction issued by a dispatching center of a local node, and controlling a GPU execution unit to execute a test task according to the test instruction according to the GPU execution unit information;

when an actual calculation task needs to be executed, a resource scheduling request is sent to a scheduling center of a local node according to an explicit scheduling requirement;

After the GPU resource execution permission granted by the dispatching center of the local node is obtained, controlling the GPU execution unit to occupy the GPU resource to execute the actual calculation task according to the GPU execution unit information;

and releasing the GPU resource execution authority after the actual computing task is executed, and notifying a dispatching center of the local node.

when the resource scheduling request is transferred to the scheduling centers of other nodes, the scheduling centers of other nodes receive the execution results returned after the actual computing tasks are executed according to the combination of the GPU execution unit information and the current optimal scheduling strategy of other nodes.

In a third aspect of the embodiment of the present application, another method for dispatching a GPU in an autopilot is provided, which is applied to a configuration management module, and the method includes:

When a dispatching center is started, providing an initial dispatching rule of a service level and/or dispatching center cluster node information for the dispatching center;

receiving and storing a current optimal scheduling strategy combination reported by a scheduling center;

When the scheduling rule of the service level is updated, storing the updated scheduling rule of the latest service level;

when the scheduling center generates candidate scheduling policy combinations, the scheduling center is provided with the latest service-level scheduling rules and historically stored optimal scheduling policy combinations.

In an alternative embodiment of the application, the latest business level scheduling rule is a version of the initial business level scheduling rule after business requirement adjustment, and the historically stored optimal scheduling policy combination comprises the optimal scheduling policy combination of the latest preset times.

In a fourth aspect of the embodiments of the present application, there is provided an autopilot GPU scheduling apparatus comprising a processor and a memory storing program instructions, the processor being configured to perform the autopilot GPU scheduling method of the first aspect, the second aspect or the third aspect of the embodiments of the present application when the program instructions are executed.

In a fifth aspect of an embodiment of the present application, there is provided a system including:

System body, and

An autopilot GPU scheduler according to the fourth aspect of embodiments of the present application is installed in a system ontology.

In a sixth aspect of the embodiments of the present application, there is provided a computer readable storage medium storing program instructions that, when executed, cause a computer to perform the GPU scheduling method of autopilot as in the first aspect, the second aspect or the third aspect of the embodiments of the present application.

The GPU dispatching method, device, system and storage medium for automatic driving provided by the embodiment of the application have the following beneficial effects:

According to the embodiment of the application, the flexibility of a scheduling scheme is ensured by dynamically generating the diversified candidate strategies, and then the performance indexes of all strategies are truly evaluated through the actual test of the task module, so that the optimal strategy is accurately screened out, and finally, the resource allocation is executed based on the optimal strategy. By generating diversified candidate strategy combinations and combining actual test feedback of the task modules, resource requirements under different load scenes can be dynamically adapted, the limitation of static strategies is avoided, the blindness of scheduling decisions is reduced, the stiff problem of fixed strategies is avoided, and the task waiting time and execution fluctuation are effectively reduced. The system can adaptively adjust the resource allocation sequence, alleviate GPU resource competition, improve the overall utilization rate, and simultaneously enhance the stability and response instantaneity of the system by reducing the processing delay of key tasks.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic diagram of an overall architecture of a system provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of an automatic driving GPU scheduling method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of another method for automatically scheduling a GPU for driving in accordance with an embodiment of the present application;

FIG. 4 is a schematic diagram of another method for automatically scheduling a GPU in accordance with an embodiment of the present application;

Fig. 5 is a schematic diagram of an automatic driving GPU dispatcher according to an embodiment of the present application.

Reference numerals:

100, a configuration management module, 101, a node, 102, a dispatching center and 103, a task module;

800 GPU dispatcher for autopilot, 801 processor, 802 memory, 803 communication interface, 804 bus.

Detailed Description

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

Referring to fig. 1, a system is provided according to an embodiment of the present application, which includes a configuration management module 100 and a plurality of nodes 101 connected to the configuration management module 100. Each node 101 includes a dispatch center 102 and a plurality of task modules 103 coupled to the dispatch center 102. The dispatch centers 102 between different nodes 101 are connected to each other and to task modules 103 of other nodes 101. The system also comprises a processor electrically connected with the modules and used for controlling the modules to execute the actions.

Fig. 2 is a schematic diagram of an automatic driving GPU scheduling method according to an embodiment of the present application, where any of the following methods may be executed in a system, or may be executed in a server or a terminal device communicatively connected to the system.

Based on the above system structure, as shown in fig. 2, an embodiment of the present application provides an automatic driving GPU scheduling method, applied to a scheduling center, including:

And S21, when the preset trigger condition is met, generating at least two candidate scheduling strategy combinations according to the registration information of each task module of the local node and the service level scheduling rules acquired from the configuration management module.

S22, testing each candidate scheduling policy combination through each task module registered by the local node, determining the current optimal scheduling policy combination of the local node, and storing the current optimal scheduling policy combination of the local node into the configuration management module.

S23, receiving resource scheduling requests of all task modules of the local node, and performing GPU resource allocation on the resource scheduling requests of all task modules of the local node according to the current optimal scheduling policy combination of the local node.

In the embodiment of the application, the preset triggering condition comprises a specific event or condition which is when the specific event or condition is met, the specific event or condition comprises a system initialization completion event, a new task module registration success event, a system load change reaching threshold event and/or a periodic time point reaching event, the system initialization completion event refers to all basic service ready states after the automatic driving system is started, the new task module registration success event refers to the new perception or planning module completing the registration process to the dispatching center, the system load change reaching threshold event refers to the GPU utilization rate or the task queue length exceeding a preset threshold value, and the periodic time point reaching event refers to the fixed time interval triggering strategy updating.

In the embodiment of the application, the registration information comprises configuration data and/or execution parameters of the task module, the configuration data of the task module comprises task priority and/or resource requirement specification defined by explicit scheduling requirements, the execution parameters comprise code logic and/or data set defined by GPU execution unit information, the task priority refers to ordering weight of the task module in a resource allocation sequence, the resource requirement specification refers to GPU video memory size or calculation core number requirements, the code logic refers to an executable instruction set of the task module, and the data set refers to input data required by the task module in operation.

In the embodiment of the application, the service level scheduling rule comprises a service constraint rule and/or a performance constraint rule, the service constraint rule comprises a module priority rule and/or a task dependency rule, the performance constraint rule comprises a maximum allowed waiting time rule and/or a maximum allowed execution period rule, the module priority rule comprises the resource allocation priority sequence of different task modules, the task dependency rule comprises the execution sequence logic among the task modules, the maximum allowed waiting time rule comprises the longest queuing time of a task scheduling request, and the maximum allowed execution period rule comprises the longest duration of the GPU resource occupied by the task modules.

In the embodiment of the application, the candidate scheduling policy combination comprises a diversified resource allocation scheme generated based on an initial scheduling rule, the initial scheduling rule is generated according to registration information and/or service level scheduling rules, the diversified resource allocation scheme comprises a task execution sequence scheme and/or a resource quota allocation scheme, the task execution sequence scheme comprises scheduling sequence arrangement of task modules, and the resource quota allocation scheme comprises a GPU (graphics processing unit) calculation unit or a segmentation proportion of a video memory.

In the embodiment of the application, the test comprises a task module which is controlled by a dispatching center to execute a simulation task and collect performance indexes, the dispatching center issues a test instruction to the task module, the task module executes the test task according to GPU execution unit information, the test instruction comprises a start time parameter, a duration time parameter and/or an execution frequency parameter, the start time parameter designates the starting time of the test task, the duration time parameter designates the running time of the test task, the execution frequency parameter designates the calling interval of the test task, and the performance indexes comprise task dispatching delay indexes and/or GPU resource utilization indexes.

In the embodiment of the application, determining the current optimal scheduling policy combination comprises comparing comprehensive performance indexes and selecting an optimal value, wherein the comprehensive performance indexes are obtained by weighting and calculating task scheduling delay indexes, GPU resource utilization indexes and/or temperature indexes, the optimal value refers to a candidate scheduling policy combination corresponding to the minimization or maximization of the comprehensive performance index value, the task scheduling delay indexes measure the time spent from the request to the execution of a task, the GPU resource utilization indexes measure and calculate the core utilization rate, and the temperature indexes measure the GPU hardware working temperature.

In the embodiment of the application, the GPU resource allocation comprises a scheduling center for processing a resource scheduling request based on a current optimal scheduling policy combination, the processing of the resource scheduling request comprises queuing operation, authority granting operation and/or authority recycling operation, the queuing operation is used for sequencing the resource scheduling request according to registration information and the current optimal scheduling policy combination, the authority granting operation is used for allocating GPU resource utilization rights to task modules meeting execution conditions, and the authority recycling operation is used for releasing the occupation state of GPU resources after the tasks are completed.

By adopting the GPU scheduling method for automatic driving, provided by the embodiment of the application, the flexibility of a scheduling scheme is ensured by dynamically generating diversified candidate strategies, and then the performance index of each strategy is truly evaluated through the actual test of the task module, so that the optimal strategy is accurately screened out, and finally the resource allocation is executed based on the optimal strategy. By generating diversified candidate strategy combinations and combining actual test feedback of the task modules, the method can dynamically adapt to resource requirements under different load scenes, and avoids the limitation of static strategies. The blindness of scheduling decisions is reduced, the problem of stiffness of a fixed strategy is avoided, and task waiting time and execution fluctuation are effectively reduced. The system can adaptively adjust the resource allocation sequence, alleviate GPU resource competition, improve the overall utilization rate, and simultaneously enhance the stability and response instantaneity of the system by reducing the processing delay of key tasks.

The automatic driving GPU dispatching method further comprises the steps of obtaining dispatching center cluster node information from a configuration management module, obtaining the load state of a local node, synchronizing the load state of the local node to other dispatching centers in the dispatching center cluster, receiving the load state of the node where the other dispatching centers are located, transferring at least part of resource dispatching requests to the dispatching centers of the other nodes meeting preset load conditions when the load state of the local node meets the preset transfer conditions, receiving the resource dispatching requests transferred by the dispatching centers of the other nodes, carrying out GPU resource allocation on the resource dispatching requests transferred by the dispatching centers of the other nodes according to the current optimal dispatching strategy combination of the local node, and returning execution results to an original task module of the other nodes.

In an embodiment of the present application, obtaining the load status of the local node includes monitoring a resource usage index and/or a task execution index. The resource usage index reflects the hardware resource consumption condition, and the task execution index reflects the task processing efficiency. The resource usage indicators include GPU utilization, temperature, and/or frequency, and the task execution indicators include task average latency, execution delay, and/or queue length. By continuously collecting the above indicators, the dispatching center dynamically evaluates the load level of the local node, such as low load, medium load, high load or overload level.

In an embodiment of the present application, the preset transition condition includes reaching a high load or overload state for the load level of the local node and/or the global cluster load view indicates that there is a low load node. The load level is dynamically evaluated based on the real-time monitoring data, and a high load or overload state refers to the load level being greater than a first load threshold, corresponding to a resource usage index and/or task execution index exceeding a first preset threshold, such as a GPU utilization exceeding 80% and/or a task average latency exceeding 50 milliseconds. A low load node means that the load level is less than a second load threshold, the corresponding resource usage index and/or task execution index is less than a second preset threshold, e.g., GPU utilization is less than 30% and/or the task average latency is less than 20 milliseconds and/or the task queue length is less than 5. When a critical performance indicator, such as delay, begins to deteriorate, a resource scheduling request transfer is triggered.

In the embodiment of the present application, the other nodes satisfying the preset load condition include nodes with low load level. The load level is dynamically updated through the global view of the dispatching center cluster, the low-load state corresponding resource use index and/or task execution index is lower than a second preset threshold, other nodes need to be communicable in the dispatching center cluster, and the load state is synchronized through timing broadcasting or threshold triggering broadcasting, so that the target node can process a request in time during resource transfer.

The dispatching center firstly acquires dispatching center cluster node information from the configuration management module and monitors the load state of the local nodes, and the global load view is constructed by synchronizing the load states of the local nodes with other dispatching centers in the cluster and receiving the load states of other nodes. And when the load of the local node meets the preset transfer condition (such as high load) and the other nodes have low load, transferring part of the resource scheduling requests to other node scheduling centers meeting the condition. After receiving the transfer request, the dispatching center of the local node performs GPU resource allocation according to the local current optimal dispatching strategy combination, and returns an execution result to the original task module. Based on the current optimal scheduling strategy combination generated by dynamically adapting to the load scene through the test candidate strategy combination, the system can utilize cluster resources to disperse load when local node resources are tense, reduce task queuing time, avoid delay fluctuation caused by single-point overload, thereby improving the utilization rate of the whole resources and the stability of the system, and simultaneously guaranteeing the real-time response of key tasks.

Optionally, GPU resource allocation is carried out on the resource scheduling requests of all task modules of the local node according to the current optimal scheduling policy combination of the local node, the method comprises the steps of queuing the resource scheduling requests of all task modules of the local node according to the registration information of all task modules of the local node and the current optimal scheduling policy combination of the local node, granting GPU resource execution permission to the task modules corresponding to the resource scheduling requests of the local node meeting execution conditions so as to execute actual computing tasks, and receiving back the GPU resource execution permission of all task modules of the local node after the actual computing tasks of all task modules of the local node are executed.

In the embodiment of the application, according to the combination of the registration information of each task module of the local node and the current optimal scheduling policy of the local node, queuing the resource scheduling request of each task module of the local node, wherein the queuing comprises executing the sequencing operation on the resource scheduling request sequence based on the task priority and/or the resource demand specification of the task module and the module scheduling sequence. For example, the task priority is used to determine the position weight of the high priority task in the queue, the resource requirement specification is used to match the GPU video memory size requirement and/or the computational core number requirement, and the module scheduling order adjusts the queue order according to the resource quota allocation scheme in the current optimal scheduling policy combination.

In the embodiment of the application, the GPU resource execution permission is granted to a task module corresponding to a resource scheduling request meeting the execution condition of a local node, wherein the task module comprises a resource availability state and a time scheduling constraint, the resource availability state refers to that the idle proportion of a GPU computing unit reaches a preset proportion threshold, and the time scheduling constraint refers to that an execution time window defined by a task maximum allowable waiting time rule and/or a maximum allowable execution period rule is opened when the resource scheduling request accords with an admission condition defined by the current optimal scheduling policy.

In the embodiment of the application, after the actual computing task of each task module of the local node is executed, the GPU resource execution permission of each task module of the local node is recovered, wherein the operation of recovering the resource permission is triggered based on a task completion event, and the task completion event comprises normal termination of the computing task or termination of an execution period. For example, the normal termination of the computing task is actively notified of the completion status by the task module, and the end of the execution period is triggered by the dispatch center to monitor the time threshold according to the maximum allowable execution period rule in the business level dispatch rules.

When the scheduling center queues the resource scheduling request based on the registration information of each task module of the local node and the current optimal scheduling policy combination, the optimal scheduling policy combination is generated through dynamic test, fully merges the service level scheduling rules under the real-time load scene, and ensures that the resource allocation strictly follows the optimal policy combination verified by actual measurement by granting the GPU resource execution permission to the task module meeting the execution condition, so that the task with high priority or matched resource requirements can acquire the execution permission in time. And a mechanism for forcibly recovering the authority after the actual calculation task is completed is based on the resource recovery rule of the optimal strategy combination, so that idle or unnecessary occupation of resources is avoided. By the dynamic queuing and authority management and control mechanism, resource allocation conflict caused by fixed scheduling rules is effectively reduced, task waiting time and execution delay are reduced, and therefore GPU resource utilization rate and system instantaneity are improved.

In practical application, a dispatching center receives the registration information of clients of task modules on physical nodes, the registration information comprises explicit dispatching requirements and GPU execution unit information, a configuration management module loads initial dispatching rules of service levels and dispatching center cluster node information, when preset trigger conditions are met, an adaptive dispatching strategy generation flow is started, the latest dispatching rules of the service levels and historical stored optimal dispatching strategy combinations are obtained from the configuration management module, initial dispatching rules are generated based on the explicit dispatching requirements, the latest dispatching rules of the service levels and historical stored optimal dispatching strategy combinations, multiple candidate dispatching strategy combinations are generated according to the initial dispatching rules, the candidate dispatching strategy combinations comprise task execution sequences and resource allocation schemes adapting to the explicit dispatching requirements, the candidate dispatching strategy combinations are sequentially selected, test instructions comprising appointed starting time, duration and execution frequency are issued to the task modules, the GPU execution units of the control task modules execute test tasks based on the GPU execution unit information, performance data in the test processes are monitored and recorded, comprehensive performance indexes of the candidate dispatching strategy combinations are calculated, the current optimal dispatching strategy combinations are selected based on the explicit dispatching requirements, the current optimal dispatching strategy combinations are combined to the GPU execution units, the optimal dispatching strategy combinations are received and the GPU execution units are received based on the request of the execution units, the optimal dispatching strategy combinations are received, the GPU execution units are requested to be executed according to the execution requirements of the GPU execution units, and the request is received according to the request of the execution rules of the GPU execution modules, the method comprises the steps of continuously monitoring local node load levels, broadcasting self load level information to other dispatching centers in a dispatching center cluster based on dispatching center cluster node information at regular time or when local node load exceeds a preset threshold value, simultaneously receiving load level information of other dispatching centers to update a global cluster load view, when the local node load level is high load or overload (the local node load is larger than a first load threshold value) and the global cluster load view shows that a low load node exists (the low load node exists smaller than a second load threshold value), transferring part of resource dispatching requests in a local waiting queue to the dispatching center of the low load node according to optimal dispatching strategy combination and explicit dispatching requirements, receiving resource dispatching requests transferred by other dispatching centers, executing tasks according to the explicit dispatching requirements of task modules corresponding to the resource dispatching requests and GPU executing unit information, executing the tasks according to the self optimal dispatching strategy combination agent, returning the results, and obtaining the latest dispatching rules of service levels and the historical stored optimal dispatching strategy combination from a configuration management module again when preset triggering conditions are met again, and repeating candidate strategy generation and subsequent steps.

Based on the above system structure, as shown in fig. 3, an embodiment of the present application provides another GPU scheduling method for autopilot, applied to a task module, where the method includes:

S31, starting the client and registering the client with a dispatching center of the local node, wherein registration information comprises explicit dispatching requirements and GPU execution unit information.

S32, receiving a test instruction issued by a dispatching center of the local node, and controlling the GPU execution unit to execute a test task according to the test instruction according to the GPU execution unit information.

And S33, when the actual calculation task needs to be executed, sending a resource scheduling request to a scheduling center of the local node according to the explicit scheduling requirement.

S34, after the GPU resource execution permission granted by the dispatching center of the local node is obtained, the GPU execution unit is controlled to occupy the GPU resource to execute the actual computing task according to the GPU execution unit information.

And S35, releasing the GPU resource execution authority after the actual computing task is executed, and informing a dispatching center of the local node.

In the embodiment of the application, the starting client means that the task module starts a software component for interacting with the dispatching center, the client can be a program module built in the task module or an independently operated process, for example, the client of the perception module is responsible for processing communication with the dispatching center, the client of the planning module bears registration and request sending functions, and the client realizes data transmission through a preset communication protocol.

In the embodiment of the application, the registration of the task module to the dispatching center of the local node refers to the process that the task module submits own information to the dispatching center through the client and establishes association, the registration process comprises the steps of sending registration information containing the identification of the task module, waiting for the dispatching center to confirm and receive, for example, transmitting a registration data packet through a network interface, and returning a confirmation signal after the dispatching center verifies the integrity of the information so as to finish registration and ensure that the dispatching center can identify and manage the task module.

In the embodiment of the application, controlling the GPU execution unit to execute the test task according to the test instruction according to the GPU execution unit information comprises a task module driving the GPU execution unit to perform calculation operation of non-actual service according to the parameters of the test instruction based on the hardware parameters and the execution codes of the GPU execution unit, wherein the test task comprises simulated image feature extraction and/or virtual path planning calculation and is used for generating scheduling performance data without influencing the actual functions of an automatic driving system.

In the embodiment of the application, when the actual calculation task is required to be executed, the task module is in a scene with real service processing requirements, for example, when a camera acquires new image data, the perception module is required to execute a target recognition task, and when a vehicle is about to enter an intersection, the planning module is required to calculate a driving route, and at the moment, the task module triggers the sending of a resource scheduling request.

In the embodiment of the application, sending a resource scheduling request to a scheduling center of a local node according to an explicit scheduling requirement refers to that a task module generates a request message containing information based on the explicit scheduling requirement such as own priority, resource requirement and the like and sends the request message to the scheduling center, and the resource scheduling request can be a quick calculation request sent by a sensing module based on a high-priority requirement or a specific resource request sent by a planning module based on a video memory requirement so that the scheduling center can accurately understand the requirement.

In the embodiment of the application, after obtaining the execution permission of the GPU resource granted by the dispatching center of the local node, the task module receives the permission of the dispatching center to allow the specified GPU resource to be used, the permission defines the available GPU resource range (such as a specific computing core and a specific video memory area) and the use time limit, and the task module starts to prepare to execute the actual computing task after confirming that the permission is valid.

In the embodiment of the application, releasing the execution permission of the GPU resource after the execution of the actual computing task is completed means that the task module terminates the occupation of the allocated GPU resource after the actual computing task is completed, wherein the occupation comprises releasing occupied computing cores and emptying the used video memory space, for example, a perception module releases occupied 8 computing cores after the target recognition is completed, and a planning module empties 2GB video memory after generating a path so that the resource can be used by other task modules.

In the embodiment of the application, the notification of the dispatching center of the local node means that the task module sends the task completion information to the dispatching center after releasing the execution authority of the GPU resource, and the notification mode can be that a message containing the task identifier and the completion state is sent or a preset signal mechanism is triggered so that the dispatching center can update the resource state in time and process the subsequent resource dispatching request.

By adopting the automatic driving GPU scheduling method provided by the embodiment of the application, the task module starts the client to register explicit scheduling requirements (such as priority and resource requirements) and GPU execution unit information (such as codes and data) to the local scheduling center, basic input parameters are provided for the scheduling center to generate diversified candidate scheduling strategy combinations, and the strategy customization is ensured to adapt to the differential requirements of different task modules. And then, the task module receives test instructions (such as appointed starting time, duration time and execution frequency) issued by the dispatching center, controls the GPU execution unit to execute test tasks of non-actual service according to the GPU execution unit information, and helps the dispatching center to accurately evaluate comprehensive performance indexes of candidate strategies and select a current optimal dispatching strategy combination by simulating a real load scene to generate performance data (such as task delay and resource utilization). When the actual computing task needs to be executed, the task module actively transmits a resource scheduling request based on the explicit scheduling requirement, and after the GPU resource execution permission granted by the scheduling center is obtained, the GPU execution unit is controlled to occupy resources to execute the task, and after the task is completed, the permission is actively released and the scheduling center is notified, so that the resource application and recovery flow are standardized, and the idle or conflict of the GPU resource is avoided. The accuracy of scheduling decision is optimized through the dual roles of the task modules, and task waiting time and execution fluctuation are reduced, so that the GPU resource utilization rate is improved, and the stability and the real-time response performance of the automatic driving system are enhanced.

Optionally, the automatic driving GPU dispatching method further comprises the step that when the resource dispatching request is transferred to a dispatching center of other nodes, the dispatching center of the other nodes receives an execution result returned after the dispatching center of the other nodes executes the actual computing task according to the combination of GPU execution unit information and the current optimal dispatching strategy of the other nodes.

In the embodiment of the application, the execution result returned after the dispatching center of the other node executes the actual calculation task comprises that the original task module directly acquires the task output data executed in different places without modification request or active intervention, the execution result is returned by the target node dispatching center through a communication interface after the task is completed, the returned content comprises the calculation task termination state and the service output data, and the original task module continues the subsequent process based on the returned data, so that the task continuity and the data consistency are ensured. For example, the execution result includes the target recognition coordinate set of the perception module or the path trajectory data generated by the planning module, and the task termination state includes a normal completion flag or an error code.

In this way, when the resource scheduling request is transferred to the scheduling center of the other node, the scheduling center of the other node uses the received GPU execution unit information (provided during the registration of the task module, including specific execution parameters such as code and data) and the current optimal scheduling policy combination thereof to efficiently execute the actual computing task, and then the task module directly receives the returned execution result without the active intervention or modification request of the original task module. Therefore, the interfaces of the task modules are kept compact and unified, and extra processing burden caused by resource transfer is avoided. Meanwhile, the execution result is directly transmitted back to the original task module, so that the continuity and data consistency of the calculation task are ensured, and the delay or error risk during cross-node execution is reduced, thereby optimizing the response efficiency of the system.

In practical application, a task module starts a client and registers to a local dispatching center, registration information is reported, the registration information comprises an explicit dispatching requirement and GPU execution unit information, a test instruction generated by a dispatching center of a local node based on the latest dispatching rule of a service level and the historical stored optimal dispatching strategy combination is received, the GPU execution unit is controlled to execute a test task according to the test instruction based on the GPU execution unit information, execution parameters of the test task adapt to the explicit dispatching requirement and the latest dispatching rule of the service level, when an actual service function needs to be executed, a resource dispatching request is sent to the local dispatching center based on the explicit dispatching requirement, after the execution authority of GPU resources granted by the local dispatching center based on the optimal dispatching strategy combination is obtained, the GPU execution unit is controlled to occupy GPU resources to execute the actual computing task based on the GPU execution unit information, the GPU resources are released after task execution is completed, the dispatching center of the local node is informed, and when the resource dispatching request is transferred to the dispatching center of other nodes, the dispatching center of the other nodes is received and the execution result is returned after the GPU execution unit information and the optimal dispatching strategy combination is executed.

Based on the above system structure, as shown in fig. 4, an embodiment of the present application provides another GPU scheduling method for autopilot, applied to a configuration management module, where the method includes:

s41, when the dispatching center is started, the dispatching center is provided with the initial dispatching rule of the service level and/or dispatching center cluster node information.

S42, receiving and storing the current optimal scheduling policy combination reported by the scheduling center.

And S43, when the scheduling rule of the service level is updated, storing the updated scheduling rule of the latest service level.

S44, when the scheduling center generates the candidate scheduling policy combination, the scheduling center is provided with the latest service level scheduling rule and the historically stored optimal scheduling policy combination.

In the embodiment of the application, the dispatching center cluster node information comprises identifiers of all nodes in the dispatching center cluster and network connection parameters, wherein the identifiers are used for uniquely identifying each dispatching center node, and the network connection parameters comprise IP addresses and port numbers and are used for realizing the establishment of communication links among the dispatching center nodes. For example, the identifier may be a node ID or a hostname, and the network connection parameters may include an IPv4 address 192.168.1.1 and port 8080, or an IPv6 address and port 9090.

In the embodiment of the application, the optimal scheduling policy combination stored in the history comprises the optimal scheduling policy combination with the latest preset times, wherein the latest preset times are integers N and N is more than or equal to 2, and the optimal scheduling policy combination is used for keeping history optimization experience and eliminating stale policies. For example, N may be set to store the most recently 5 optimal policy combinations 5 times, or N may be set to store the most recently 10 optimal policy combinations 10 times.

In the embodiment of the application, the latest business-level dispatching rules comprise versions of the initial business-level dispatching rules after business demand adjustment, and the business demand adjustment comprises urgent task priority change or execution period update. For example, an urgent task priority change may raise the obstacle detection module priority to a maximum, and an execution period update may shorten the maximum allowable execution period of the path planning module from 100 milliseconds to 80 milliseconds.

By adopting the GPU dispatching method for automatic driving, which is provided by the embodiment of the application, when a dispatching center is started, a configuration management module actively provides an initial service level dispatching rule (such as module priority and maximum allowed waiting time) and dispatching center cluster node information, and lays a global strategy foundation for the dispatching center to generate diversified candidate dispatching strategy combinations and cross-node load balancing. And continuously receiving and storing the current optimal scheduling strategy combination reported by the scheduling center to form a historical optimal strategy library. And when the business level dispatching rule is updated, the latest version is stored in real time, so that the timeliness of the strategy is ensured. When the scheduling center regenerates the candidate scheduling policy combination, the latest service rule and the historical optimal policy combination are synchronously provided, so that the initial scheduling rule generation can not only integrate the latest service requirement, but also inherit the historical optimization experience. Therefore, the configuration management module ensures the dynamic adaptability and system consistency of the scheduling strategy by centralized data management, avoids information faults during strategy iteration, and optimizes the accuracy and continuity of resource scheduling decisions.

Optionally, the latest business level scheduling rule is a version of the initial business level scheduling rule after business requirement adjustment, and the historically stored optimal scheduling policy combination comprises the optimal scheduling policy combination of the latest preset times.

In this way, the configuration management module defines the latest business-level dispatching rule as a version (such as emergency task priority change or execution period update) of the initial rule after business requirement adjustment, ensures that the dispatching center can fuse the latest business constraint when generating candidate dispatching strategy combination, avoids the disjoint of the strategy and the current requirement, and simultaneously, the historically stored optimal dispatching strategy combination only keeps the most recent preset times (such as the most recent 5 times) of optimization results, eliminates the old strategy, so that the dispatching center can refer to the historical optimization experience (such as the effective strategy in a high-load scene) and avoid the interference of the outdated strategy when generating the new candidate strategy. The business rule adjustment can act on policy generation in real time, and the carefully chosen historical policy library guarantees continuity of scheduling optimization, so that accuracy and scene adaptability of resource allocation decisions are cooperatively improved.

In practical application, when the dispatching center is started, the configuration management module provides the dispatching rules of the initial service level and dispatching center cluster node information for the dispatching center to use in the system initialization and registration stage, when the dispatching center starts the self-adaptive dispatching strategy generation flow, the dispatching center is provided with the latest dispatching rules of the service level and the historical stored optimal dispatching strategy combination for the dispatching center to generate the initial dispatching rules in the candidate strategy generation stage, receives and stores the optimal dispatching strategy combination reported by the dispatching center in the optimal strategy selection stage as the historical stored optimal dispatching strategy combination for the subsequent generation of the candidate dispatching strategy combination, and when the dispatching rules of the service level are updated, the updated latest dispatching rules of the service level are stored and provided for the dispatching center when the dispatching center starts the self-adaptive dispatching strategy generation flow.

Referring to fig. 5, an embodiment of the present application provides an autopilot GPU scheduler 800, including a processor (processor) 801 and a memory (memory) 802. Optionally, the apparatus may also include a communication interface (Communication Interface) 803 and a bus 804. The processor 801, the communication interface 803, and the memory 802 may communicate with each other via the bus 804. The communication interface 803 may be used for information transfer. The processor 801 may invoke logic instructions in the memory 802 to perform the autonomous GPU scheduling method of the above embodiments.

Further, the logic instructions in the memory 802 described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product.

The memory 802 is a computer readable storage medium, and may be used to store a software program, a computer executable program, and program instructions/modules corresponding to the methods in the embodiments of the present application. The processor 801 executes functional applications and data processing by running program instructions/modules stored in the memory 802, i.e., implements the GPU scheduling method for autopilot in the above-described embodiments.

The memory 802 may include a storage program area that may store an operating system, application programs required for at least one function, and a storage data area that may store data created according to the use of the terminal device, etc. In addition, memory 802 may include high-speed random access memory, and may also include non-volatile memory.

The embodiment of the application provides a system, which comprises a system body and the GPU dispatching device 800 for automatic driving. The GPU dispatcher 800 for system autopilot is installed on the system body. The mounting relationships described herein are not limited to placement within a system, but include mounting connections to other components of a system, including but not limited to physical, electrical, or signal transmission connections, etc. Those skilled in the art will appreciate that the system autopilot GPU scheduler 800 may be adapted to a viable system host, thereby implementing other viable embodiments.

The embodiment of the application provides a computer readable storage medium, which stores computer executable instructions configured to execute the GPU scheduling method for automatic driving.

The technical solution of the embodiment of the present application may be embodied in the form of a software product, where the software product is stored in a storage medium, and includes one or more instructions to cause a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method of the embodiment of the present application. The storage medium may be a non-transitory storage medium, including a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, etc., which may store the program code.

The above description and the drawings illustrate embodiments of the application sufficiently to enable those skilled in the art to practice them. Other embodiments may involve structural, logical, electrical, process, and other changes. The embodiments represent only possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in, or substituted for, those of others. Moreover, the terminology used in the present application is for the purpose of describing embodiments only and is not intended to limit the claims. As used in the description of the embodiments and the claims, the singular forms "a," "an," and "the" (the) are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this disclosure is meant to encompass any and all possible combinations of one or more of the associated listed. Furthermore, when used in the present disclosure, the terms "comprises," "comprising," and/or variations thereof, mean that the recited features, integers, steps, operations, elements, and/or components are present, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Without further limitation, an element defined by the phrase "comprising one..+ -." does not exclude the presence of additional identical elements in a process, method or apparatus comprising said element. In this context, each embodiment may be described with emphasis on the differences from the other embodiments, and the same similar parts between the various embodiments may be referred to each other. For the methods, products, etc. of the embodiment application, if they correspond to the method sections of the embodiment application, reference can be made to the description of the method sections for relevance.

Those of skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. The skilled person may use different methods for each particular application to achieve the described functionality, but such implementation is not to be considered as beyond the scope of the embodiments of the present application. It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the embodiments disclosed herein, the disclosed methods, articles of manufacture (including but not limited to devices, apparatuses, etc.) may be practiced in other ways. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units may be merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form. The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to implement the present embodiment. In addition, each functional unit in the embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the description corresponding to the flowcharts and block diagrams in the figures, operations or steps corresponding to different blocks may also occur in different orders than that disclosed in the description, and sometimes no specific order exists between different operations or steps. For example, two consecutive operations or steps may actually be performed substantially in parallel, they may sometimes be performed in reverse order, which may be dependent on the functions involved. Each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims

1. The GPU dispatching method for automatic driving is characterized by being applied to a dispatching center, and comprises the following steps:

2. The method as recited in claim 1, further comprising:

3. The method of claim 1, wherein GPU resource allocation is performed on the resource scheduling requests of each task module of the local node according to the current optimal scheduling policy combination of the local node, comprising:

4. The GPU dispatching method for automatic driving is characterized by being applied to a task module, and comprises the following steps:

5. The method as recited in claim 4, further comprising:

6. The GPU dispatching method for automatic driving is characterized by being applied to a configuration management module, and comprises the following steps:

7. The method of claim 6, wherein the step of providing the first layer comprises,

The latest business-level dispatching rule is the version of the initial business-level dispatching rule after business requirement adjustment;

The historically stored optimal scheduling policy combinations include the most recently preset number of optimal scheduling policy combinations.

8. An autonomous GPU scheduling apparatus comprising a processor and a memory storing program instructions, wherein the processor is configured to perform the autonomous GPU scheduling method of any of claims 1 to 7 when executing the program instructions.

9. A system, comprising:

System body, and

The autonomous GPU scheduling apparatus of claim 8, mounted to the system body.

10. A computer readable storage medium storing program instructions which, when executed, are to cause a computer to perform the method of GPU scheduling for autopilot of any one of claims 1 to 7.