Detailed Description
So that the manner in which the features and techniques of the disclosed embodiments can be understood in more detail, a more particular description of the embodiments of the disclosure, briefly summarized below, may be had by reference to the appended drawings, which are not intended to be limiting of the embodiments of the disclosure. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may still be practiced without these details. In other instances, well-known structures and devices may be shown simplified in order to simplify the drawing.
The terms first, second and the like in the description and in the claims of the embodiments of the disclosure and in the above-described figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe embodiments of the present disclosure. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion.
The term "plurality" means two or more, unless otherwise indicated.
In the embodiment of the present disclosure, the character "/" indicates that the front and rear objects are an or relationship. For example, A/B represents A or B.
The term "and/or" is an associative relationship that describes an object, meaning that there may be three relationships. For example, A and/or B, represent A or B, or three relationships of A and B.
The term "corresponding" may refer to an association or binding relationship, and the correspondence between a and B refers to an association or binding relationship between a and B.
Fig. 1 is a system architecture diagram of an embodiment of the present disclosure.
As shown in fig. 1, the system architecture adopts a bottom-up hierarchical architecture design, and sequentially comprises an infrastructure layer, a resource management layer, a core scheduling layer and an application service layer.
The infrastructure layer provides a 'stable and extensible' bottom layer operation foundation for the whole dispatching system, and the infrastructure layer supports a plurality of deployment forms such as a physical machine, a virtual machine, SERVERLESS and the like based on a containerized operation environment constructed by Kubernetes. And realizing the consistency of resource isolation and environment through a containerization technology. The method is compatible with multiple types of deployment forms, can meet the high-performance requirements of a physical machine, can realize flexible resource expansion through SERVERLESS, and is suitable for deployment requirements of different scenes such as intelligent driving, AI training and the like.
The resource management layer consists of a resource monitoring subsystem, a resource predicting subsystem and a resource scheduling subsystem. The resource monitoring subsystem is used for collecting data by adopting Prometaheus, the resource predicting subsystem is used for predicting resource demands based on historical data, and the resource scheduling subsystem is used for executing resource allocation and adjustment logic. The real-time load state of heterogeneous resources such as CPU, GPU, memory bandwidth, network I/O and the like is acquired through high-frequency acquisition at intervals of 500ms, and resource gaps are identified in advance through resource prediction of a 15-minute window, namely resource allocation is further executed according to a 10-second scheduling period, and the high-efficiency utilization of resources is realized by combining task priority and resource matching degree, so that the resource waste caused by the traditional static partition is avoided.
The core scheduling layer adopts an Actor model to construct a scheduling unit, each task corresponds to one Actor instance, and the method comprises a task arrangement module, a topology ordering module and a priority control module. The task orchestration module is used for adapting the design requirement of the directed acyclic graph (DIRECTEDACYCLICGRAPH, DAG) of the application service layer, completing the task definition analysis, the DAG model construction and the task and resource association connection. The method comprises the steps of receiving task metadata submitted by a client, combining the dual-version DAG state of a dynamic DAG manager, organizing discrete task nodes into an execution flow conforming to business logic, and providing a structured task link foundation for subsequent topological sorting and priority control. The topology sequencing module is used for executing a topology sequencing algorithm based on a DAG model formed by task arrangement, ensuring that tasks are orderly executed according to a dependency relationship, and avoiding deadlock caused by circular dependency. The method comprises the steps of carrying out dependency analysis on nodes in a task link through a topology sequencing function, such as desensitization and decryption, binding a graph and analyzing time sequence constraint in an intelligent driving scene, outputting a loop-free task execution sequence, and simultaneously, re-executing topology sequencing after dynamic DAG adjustment to ensure that the changed task link still meets time sequence legitimacy. The priority control module is used for combining task attributes and resource states, calculating comprehensive priorities of tasks, determining execution sequences of the tasks, guaranteeing that high-priority tasks obtain resources preferentially, and simultaneously coordinating resource allocation logic of the resource management layer to avoid congestion caused by occupation of key resources by low-priority tasks.
The application service layer provides a multi-language SDK and a standardized API, meets the client access requirements of different technical stacks, realizes the DAG visual arrangement through a dragging interface, does not need to manually write codes, reduces the use threshold of business personnel, and improves the task flow design efficiency.
Through the system architecture, the infrastructure layer provides an operation carrier for the resource management layer, the resource management layer outputs real-time resource data for the core scheduling layer, the core scheduling layer completes task scheduling and execution based on the resource data and the dynamic DAG, and the application service layer provides an interaction entrance with the core scheduling layer for a user. The four layers cooperate to realize the full link opening of resources, topology, tasks and users, and accurately solve the problems of the traditional system such as topology rigidness, low efficiency of resources, weak fault tolerance and the like.
Fig. 2 is a flowchart of a data task scheduling method according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes:
Step S201, an initial task graph model is built according to task definition, and topology change instructions are dynamically received to conduct online adjustment on the current task graph model.
Here, the task definition refers to a task definition submitted by the client through the RESTful API, including one or more of task metadata, dependencies, heterogeneous resource requirements. The task is defined as JSON format. And analyzing the validity of the resource demand format and whether an initial dependency ring exists or not through grammar checking and dependency analysis, and generating an initial DAG model conforming to business logic as an initial task graph model.
Meanwhile, a mechanism capable of receiving and responding to the topology change instruction in real time is established, a hot plug protocol stack is introduced to support dynamic addition or deletion of task nodes during operation, a dual-version storage structure is maintained through a dynamic DAG, so that after the dynamic topology change instruction is received through a RESTful API, only a task graph to be validated is written, and an execution version is not directly modified.
Step S202, data consistency verification is executed on the adjusted task graph model.
And after dynamic adjustment and before effective switching, introducing a data consistency check link to verify whether the adjusted topology meets constraint conditions such as loop-free performance and the like, so as to ensure that logic errors such as circular dependence and the like do not exist. The system can provide safety guarantee for the dynamic flexibility of the system. Through data consistency verification, system deadlock or data logic confusion caused by wrong topology modification can be effectively prevented, and large-scale dynamic adjustment becomes reliable and credible.
And step S203, after verification is passed, effective switching of the new task graph model after adjustment is realized, and the effective switching process does not interrupt the executing task in the system.
After the verification is passed, the new topology model is validated through an atomic switching operation. In this process, the original execution version model continues to serve the task that has been run until it is completed. The problem of service interruption caused by topology change of the traditional system can be avoided, and high availability and continuity of service are ensured.
Step S204, task scheduling and resource allocation are carried out based on the validated new task graph model and the heterogeneous resource state, and the target execution task adapting to the heterogeneous environment is determined.
Here, based on the new task graph model after the validation, the multidimensional heterogeneous resource model is perceived at the same time, and comprehensive scheduling decisions are made by calculating the matching degree of task demands and resource states and combining task attributes. The resource utilization rate and the task execution efficiency of the whole system can be improved. Through refined resource matching, the phenomena of resource idling and task accumulation are reduced, the use of expensive heterogeneous computing power such as a GPU (graphics processing unit) is optimized, and meanwhile, the task with high priority can be timely processed.
In step S205, the target execution task is executed in the heterogeneous environment.
The heterogeneous hardware is mapped uniformly through a resource abstraction layer, and a task decomposition engine is utilized to decompose a complex task into subtasks suitable for parallel execution. In the whole execution process, a real-time monitoring and intelligent fault-tolerant mechanism is assisted, so that the task can be efficiently and stably run and completed in a complex heterogeneous environment.
Thus, according to the data task scheduling method provided by the embodiment of the disclosure, by constructing the initial task graph model according to task definition, introducing a dynamic topology maintenance mechanism, receiving and responding to the topology change instruction in real time, performing online adjustment and consistency verification on the task graph model during operation, and finally realizing seamless effective switching of a new model, so as to solve the problem of service interruption caused by static topology stiffness in the traditional system, and enable service logic change to be completed without stopping the whole system in the scene of intelligent driving and the like needing quick iteration. On the basis, the method further completes task scheduling and resource allocation based on the new model and heterogeneous resource state after the validation, and executes target tasks in heterogeneous environment to ensure the change of landing, and finally realizes flexible evolution of topological structure, continuous and reliable task execution and efficient adaptation of heterogeneous resources, and can adapt to complex scenes such as intelligent driving, which need frequent adjustment of data production links and have extremely high requirements on service continuity.
Optionally, the data consistency check in step S202 includes one or more of the following:
the dependency ring detection ensures that the task graph after adjustment is still in an acyclic structure, and avoids deadlock caused by reverse dependency;
Verifying attribute validity, verifying whether resource requirements of newly added nodes and task metadata formats are compliant, if so, verifying whether GPU video memory requirements of newly added GPU intensive image feature extraction nodes exceed total capacity of heterogeneous resources of a system, and if so, verifying whether formats such as task IDs, dependent node IDs and the like meet client API submission specifications.
And the concurrent conflict detection is carried out, the modified version of each node is recorded through the version vector, the conflict during multi-user concurrent adjustment is rapidly identified, the time complexity of the conflict detection is reduced, and the inconsistency of task graph data caused by concurrent operation is avoided.
Thus, through data consistency verification, the risk of system confusion caused by dynamic adjustment can be avoided. If the traditional system forcibly supports dynamic adjustment, task deadlock or resource overrun is easily caused by dependence ring and attribute errors, illegal change is intercepted through pre-verification in the step, and the validity and stability of a task graph model are ensured. Meanwhile, concurrent adjustment efficiency can be improved, performance loss of traditional full-scale traversal verification is avoided, simultaneous adjustment of different nodes is supported, and cooperation efficiency is improved.
The following describes how to make online adjustments to the current task graph model in connection with specific embodiments.
Fig. 3 is a flowchart of another data task scheduling method according to an embodiment of the present disclosure.
As shown in fig. 3, the method includes:
step S301, an initial task graph model is built according to task definition, and topology change instructions are dynamically received.
Through RESTful API as standardized access entrance, client side can submit topology change instruction through JSON format, instruction content contains metadata such as new/delete/modify node change type, target node ID, node dependency relationship, resource demand attribute, etc. The format legitimacy of the change instruction is analyzed through the hot plug protocol stack, the access difference of different clients is shielded, the unified identification of the instruction by the dynamic DAG manager is ensured, and the receiving logic is not required to be adjusted due to different client technology stacks.
Step S302, a double-version storage structure of an execution-version task graph model and a task graph model to be validated is maintained, wherein the execution-version task graph model is an effective model for executing a task of a current support system, and the task graph model to be validated is a temporary model for receiving topology changes.
The execution version task graph model is used as an effective model for executing the task of the current support system, directly provides core basis such as topology dependence, execution time sequence and the like for the running task, and does not receive any change operation during the running period;
the task graph model to be validated is used as a temporary model for receiving topology change, and only carries change operations such as adding nodes, modifying dependencies, deleting subtasks and the like, and does not participate in execution scheduling of the current task.
Step S303, a topology change instruction is applied to the task graph model to be validated to complete the online adjustment preparation of the task graph model, and the task graph model of the execution version continuously serves the running task.
The dynamic DAG manager is internally provided with an atomization method such as a new node dd_node (), a deleted node delete_node (), a modification dependent modification_dependency (), and the like, each change instruction corresponds to one atomic operation, so that intermediate state confusion caused by operation splitting is avoided, the version number of a target node is synchronously updated when each change is written into a to-be-validated version, traceability of each change operation is ensured, and if the subsequent verification fails, the history version can be quickly positioned and rolled back.
For example, taking a newly added node as an example, after a client submits an instruction of 'newly added 4D millimeter wave radar analysis node', a dynamic DAG manager calls an add_node () method, node information containing a dependency relationship, resource requirements and version numbers is written in a task graph model to be validated, and concurrent security is ensured through a lock mechanism by operation, so that data coverage caused by simultaneous execution of multiple instructions is avoided.
Step S304, data consistency verification is executed on the adjusted task graph model.
And step S305, after verification is passed, effective switching of the new task graph model after adjustment is realized, and the effective switching process does not interrupt the executing task in the system.
Step S306, task scheduling and resource allocation are carried out based on the validated new task graph model and the heterogeneous resource state, and the target execution task adapting to the heterogeneous environment is determined.
Step S307, executing the target execution task in the heterogeneous environment.
Thus, by means of the data task scheduling method provided by the embodiment of the disclosure, the switching of the execution version and the to-be-validated version task graph model is relied on, the bottleneck of static topology stiffness of the traditional system is broken through, the requirements of newly-added sensor analysis nodes and the like in an intelligent driving scene can be responded in real time, the executing task is switched without interruption, the instantaneity of changing is improved, and the utilization rate of data production resources and the flexibility of the system are improved.
Further, in step S305, after the verification is passed, the verification-passed new task graph model is subjected to validation switching, which includes atomically switching the task graph model to be validated, which is passed by the verification, to a new execution task graph model, wherein after the switching operation is completed, the newly submitted task is scheduled according to the new execution task graph model, and the task which has been started to be executed before the switching is continued to be executed according to the original execution task graph model until the completion.
The effective switching of the figure model is realized through atomic pointer exchange, database transaction or distributed consensus algorithm, and the instantaneity and consistency of version switching under the global view are ensured. In the embodiment of the disclosure, the task graph model to be validated, which is checked through deep copying, is atomically replaced by a new execution version task graph model, and the switching process has no intermediate transition state, so that incomplete task graph data caused by switching interruption can be avoided.
After the switching is completed, each task is bound with an independent Actor instance with the currently effective execution version task graph model when being started, and the Actor instance only reads the topology data of the bound version during the task execution and is not changed after the execution of the version switching. Here, the new execution version model immediately serves all newly submitted tasks, while the old execution version model is not destroyed immediately, but continues to serve tasks that have been started to be executed before the switch, as a read-only copy, until these tasks are naturally completed.
Thus, the certainty of task execution logic is ensured by letting the old task run continuously according to the original graph model. The dependency path and the context environment of a task are fixed from beginning to end, so that logic confusion or data errors caused by the change of the underlying topology in the middle of task execution are avoided. This simplifies troubleshooting and enhances the observability and traceability of the system. Meanwhile, the atomicity and the non-interruption property of the switching operation enable the dynamic topology adjustment to be safely executed in the business peak period without deliberately selecting the low peak period operation, thereby greatly improving the response capability of the system to the real-time changing requirement and improving the flexibility and the resource utilization rate of the system.
The task scheduling and resource allocation will be described with reference to specific embodiments.
Fig. 4 is a flowchart of another data task scheduling method according to an embodiment of the present disclosure.
As shown in fig. 4, the method includes:
Step S401, an initial task graph model is built according to task definition, and topology change instructions are dynamically received to conduct online adjustment on the current task graph model.
Step S402, data consistency verification is executed on the adjusted task graph model.
And step S403, after verification is passed, effective switching of the new task graph model after adjustment is realized, and the effective switching process does not interrupt the executing task in the system.
Step S404, analyzing the new task graph model to extract the resource requirements, task attributes and the dependency relationship among the tasks.
Metadata of the new task graph model after the validation can be parsed by a grammar parser to extract corresponding information. The method includes the steps of reading a resource_requirement field in a task definition to extract task requirements, reading a deadline deadline, a historical execution success rate, a basic priority_base and other task attributes, and recording a relation between a front task predecessor and a rear task successor among tasks by traversing an edge set of a DAG to form a dependency adjacency list.
Therefore, the unstructured task graph is converted into structured data of the dependency relationship among the resource requirements, the task attributes and the tasks, scheduling confusion caused by information loss is avoided, standardized input is provided for subsequent scheduling, meanwhile, clear dependency relationship provides basis for sequential execution, and the problem of dependency conflict caused by disordered triggering of the tasks of the traditional system is solved.
Step S405, collecting the resource state of the heterogeneous computing environment in real time to form a global resource view.
The Prometheus+ Exporter component can be adopted, heterogeneous resource data can be acquired every 500ms, the heterogeneous resource data comprises a physical server, a Kubernetes Pod, an FPGA accelerator card and the like, and the heterogeneous resource state comprises four-dimensional indexes of CPU idle core number/utilization rate, GPU video memory idle amount/calculation power, memory bandwidth and network I/O throughput so as to form a four-dimensional resource view. The node data are aggregated in real time through the Flink stream processing engine, a unified key value is generated, and a global resource view is formed by the structure global_resource_view, such as { "node1": { "cpu":8, "cpu": 2},...
Therefore, real-time resource state is ensured through 500 ms-level high-frequency acquisition, and the problem of scheduling decision over-time caused by slow updating of a static resource table of a traditional system is avoided. Meanwhile, the global resource view breaks the single-node resource view limitation, so that a scheduler can overall plan resources across nodes, and the overall resource utilization rate is improved.
In step S406, the task' S scheduling ready state is determined based on the dependency relationship.
Here, the Kahn algorithm is used to determine the task ready state by relying on an adjacency list. Counting the number of unfinished front tasks of each task, if the task B depends on A, the initial entering degree of B is 1, when the entering degree of the task is reduced to 0, indicating that all the front tasks are finished, marking the front tasks as a dispatch ready state, and adding the front tasks into a ready queue ready_queue. Each time a task is completed, the subsequent task is traversed and its degree of penetration is decremented by 1, and a determination is repeated as to whether it is ready.
Therefore, only the ready task is allowed to enter the scheduling flow, and task execution failure caused by unsatisfied dependence is avoided. Meanwhile, the ready queue is dynamically updated, so that the ready task can be matched as soon as the resource is free, and the waiting time of the resource is reduced.
Step S407, for the task in the dispatch ready state, calculate the matching degree between its resource requirement and the global resource view.
And calculating the matching degree of the ready task and the global resource by adopting a vector space distance model. The method comprises the steps of converting task resource demands into demand vectors, converting available resources of all nodes in a global resource view into resource vectors, and calculating the distance between the two vectors through a cosine similarity formula, wherein the smaller the distance is, the higher the matching degree is. Further, resource nodes with the matching degree not less than the threshold value can be reserved and used as candidate allocation objects. Specifically, the threshold may be 0.7.
The traditional problems that CPU intensive tasks are distributed to GPU nodes, small memory tasks occupy large memory nodes and the like are avoided through quantitative matching, and the resource utilization accuracy is improved. Meanwhile, the high-matching-degree resources are screened, and the decision complexity of subsequent resource allocation is reduced.
In step S408, the comprehensive priority of the task is calculated by combining the matching degree and the task attribute.
And calculating the comprehensive priority by adopting a weighted summation model. The task attribute at least comprises a reading deadline and a historical execution success rate, and weight distribution is carried out on the task attribute, so that the sum of the task attribute and the matching degree weight is 1.
Illustratively, in the embodiment of the present disclosure, the matching degree is set to 0.3, the reading deadline is set to 0.5, and the historical execution success rate is set to 0.2. The lower the remaining time of the reading deadline, the higher the corresponding value, and the history execution success rate is set to be more than 90 percent as full score.
Then, the comprehensive priority is calculated as: comprehensive priority =0.3×matching degree +. =0.3× degree of matching +.
The higher the score, the higher the comprehensive priority, and the tasks in the ready queue are arranged in descending order of the comprehensive priority to form a comprehensive priority queue.
Therefore, the method can ensure that the task with urgent deadline can obtain resources preferentially, avoid the key task delay caused by the traditional first-come first-serve, reduce the priority of the task with high failure risk by combining the historical success rate, and reduce the resource waste.
Step S409, allocating resources for the task according to the comprehensive priority.
Selecting the node with highest matching degree from the candidate resource nodes according to the priority queue order, binding the resource through an allocation_resource (task_id) interface, updating the global resource view immediately after allocation, marking the resource state as locking, and avoiding repeated allocation.
The high priority task obtains resources preferentially, solves the conflict problem caused by the simultaneous request of the same resource by the traditional multitasking, and updates the resource state in real time after allocation so as to avoid inconsistent scheduling decision and actual resource state.
In step S410, under the condition that the current heterogeneous resource status cannot meet the task requirement, a resource backfilling or elastic capacity expansion mechanism is triggered.
The utilization rate is improved by 15% -20% compared with the traditional resource fixed allocation by backfilling idle resources occupied by the disc-activity low-priority tasks, the problem of insufficient peak task resources is solved by elastic capacity expansion, and task queuing at the time of intelligent driving data acquisition peak is avoided.
Specifically, backfill _resources () method can be called, low-priority tasks are screened, allocated resources are released and re-allocated to high-priority tasks, break points of interrupted tasks are recorded, if the backfilled resources are still insufficient, HPA (Horizontal Pod Autoscaler, pod level automatic expander) requests are sent through Kubernetes API, pod instances of GPU/CPU types are dynamically increased, and global resource views are updated after expansion is completed.
In step S411, the task decomposition engine is utilized to split the complex task in the new task graph model into sub-tasks that can be executed on different computing units in parallel, so as to execute the task as a target.
The complex task is split according to the resource type and divided into a CPU subtask, a GPU subtask, an FPGA subtask and the like, and the subtask inherits the dependency relationship and part of the attribute of the original task. The split subtasks and the uncomplicated tasks together form a target execution task adapting to the heterogeneous environment.
Therefore, after the complex task is split into the subtasks, the subtasks can be executed in parallel on multiple heterogeneous resources, the time consumption is shortened by 40% -60% compared with single-node execution, and meanwhile, the limitation that the complex task can only be executed on a single resource is avoided.
In step S412, each target execution task is packaged as an independent concurrent execution unit.
Each target execution task is packaged into an independent Actor instance, and comprises task codes, resource requirements, dependent callback functions and the like, wherein the Actor instances are communicated through a message queue, and a memory is completely isolated from a computing resource.
Therefore, if a single task abnormality occurs, the whole task graph is not crashed, the fault tolerance of the system is improved, the lightweight characteristic of the Actor model supports the simultaneous operation of thousands of target tasks, and the concurrency requirement of intelligent driving mass data processing is met.
Step S413, triggering the sequential execution of the concurrent execution units in a dependent driving mode according to the task dependency relationship in the new task graph model.
When one Actor instance completes execution, a dependence meeting event is sent to an Actor of a post-task through a callback function, and after the Actor of the post-task receives the event, if all the pre-dependencies are met, the execution is automatically started.
The event-driven timing polling check dependence can reduce the trigger delay and improve the overall efficiency of the task flow.
In step S414, each concurrent execution unit is scheduled to run on the heterogeneous resource that matches its resource requirement.
The resource abstraction layer maps the physical resources and the virtual resources into standard computing units in a unified way, and dispatches the Actor instance to the corresponding standard computing unit based on the matching degree result in the step S407. Multiple scheduling tools are not required to be deployed for the CPU/GPU/FPGA and the like, the complexity of the system is reduced, tasks run on matched resources, and the inefficiency problem caused by the condition that the CPU simulates GPU calculation and the like is avoided.
By adopting the data task scheduling method provided by the embodiment of the disclosure, the dynamic topology maintenance mechanism is introduced, so that the task graph model can be subjected to real-time adjustment and consistency verification in response to the service change requirement on line, and finally the hot update and smooth switching of the new model are realized. The mechanism is helpful to improve the problem of service interruption caused by topology fixation of the traditional scheduling system, so that in the fast iteration scene such as intelligent driving, the updating of the service logic can be completed without stopping the whole pipeline. Further, the system executes intelligent scheduling and task decomposition of resource perception based on the validated new task graph model and multidimensional heterogeneous resource states, so that scheduling rationality of heterogeneous resource environments is improved, and effective utilization of computational resources such as a GPU is promoted. Finally, by executing the tasks which are optimally scheduled in the heterogeneous environment and combining a real-time monitoring and elastic fault tolerance mechanism, the recovery capability of the system under the condition of local faults is enhanced, the overall task delay risk caused by node failure is reduced, and technical support is provided for the continuity, flexibility and stability of the complex data production flow.
Fig. 5 is a flowchart of another data task scheduling method according to an embodiment of the present disclosure.
As shown in fig. 5, the data task scheduling method includes:
Step S501, an initial task graph model is built according to task definition, and topology change instructions are dynamically received to perform online adjustment on the current task graph model.
Step S502, data consistency check is executed on the adjusted task graph model.
Step S503, after verification is passed, effective switching of the new task graph model after adjustment is realized, and the effective switching process does not interrupt the executing task in the system;
Step S504, task scheduling and resource allocation are carried out based on the validated new task graph model and the heterogeneous resource state, and the target execution task adapting to the heterogeneous environment is determined.
In step S505, the target execution task is executed in the heterogeneous environment.
In step S506, the task execution state is detected by a fault detection mechanism, wherein the fault detection mechanism comprises one or more of heartbeat detection, task timeout detection and data consistency detection.
Here, the monitoring of the task execution state is realized by a fault detection mechanism of multi-level cooperative detection.
The method comprises the steps that a communication mechanism based on an Actor is achieved through heartbeat detection, a concurrent execution unit of each task sends a heartbeat packet to a core scheduling layer every 100ms, the heartbeat packet contains information such as task ID, current progress and resource occupation, if the scheduling layer does not receive the heartbeat packet for 3 times continuously, the task is marked as suspected faults, and further verification is triggered.
And (3) detecting overtime of the task, presetting a dynamic overtime threshold value for each task by combining deadline in the task attribute and the historical execution time, comparing the actual execution time of the task with the threshold value in real time by an overtime monitor of a scheduling layer, and judging that overtime faults exist when the execution time exceeds the threshold value and the progress is less than 90 percent.
And checking the hash value and the expected value of the input data, and checking the metadata and the business rules of the output result by comparing the hash value and the expected value of the input data. Either check is not determined to be a data consistency failure.
Step S507, if the task execution failure is detected, triggering a recovery strategy according to the failure type, wherein the failure type at least comprises temporary failure and permanent failure.
The temporary faults refer to faults caused by self-healing factors such as resource fluctuation, network instantaneous interruption and the like. This type of fault accounts for about 70% of the total number of faults. Permanent faults refer to faults caused by non-self-healing factors such as task code errors, permanent damage of resources and the like. Such faults account for about 30% of the total number of faults.
Differential recovery strategies are adopted for temporary faults and permanent faults, and the differential recovery strategies specifically comprise the following steps:
In the case where the failure type is a temporary failure, the recovery policy includes automatic retry, resource warm-up, and retry limit. The automatic retry refers to calling a retry_task () interface, putting the task into a dispatch ready queue again, and distributing the task to the standby resource node preferentially. The resource preheating refers to activating the buffer memory of the target node through the resource management layer before retrying to avoid time consumption of secondary starting, the retrying limiting refers to limiting the retrying times to avoid infinite circulation, and the index back-off strategy is adopted to set the retrying interval. Specifically, the number of retries was 3, and the intervals were 1s, 3s, and 5s in this order.
In the case where the failure type is a permanent failure, the recovery strategy includes fault isolation, alerting and degradation, and root cause recording. The fault isolation refers to immediately stopping the resource occupation of a fault task and preventing error diffusion, the alarming and degradation refers to pushing the alarming to operation and maintenance personnel through a monitoring system and triggering task degradation, and the root cause record refers to writing fault details into an audit log to provide basis for subsequent optimization.
Thus, by automatically and intelligently retrying temporary faults, the common instantaneous fluctuation in the environment can be effectively responded, so that a large number of tasks which can be successful are prevented from failing. And avoid wasteful consumption of system resources while the failure persists, leaving resources for other healthy tasks. The permanent faults can be quickly identified and alarmed, so that operation and maintenance personnel can quickly intervene and locate the root cause. The downgrade strategy ensures that the core or most of the system functions continue to operate even if some links fail.
Fig. 6 is a schematic diagram of a data task scheduling system provided in an embodiment of the present disclosure.
As shown in connection with fig. 6, the system comprises:
the task graph management module 601 is configured to construct an initial task graph model according to task definitions, and dynamically receive topology change instructions to perform online adjustment on the current task graph model;
a consistency check module 602 configured to perform data consistency check on the adjusted task graph model;
the topology switching module 603 is configured to implement effective switching of the new task graph model after adjustment after the verification of the consistency verification module is passed, and the effective switching process does not interrupt the task being executed in the system;
The resource scheduling module 604 is configured to perform task scheduling and resource allocation based on the validated new task graph model and the heterogeneous resource state, and determine a target execution task adapted to the heterogeneous environment;
The task execution module 605 is configured to execute a target execution task in a heterogeneous environment.
Therefore, the data task scheduling system can respond to the service change requirement on line by introducing a dynamic topology maintenance mechanism, carry out real-time adjustment and consistency verification on the task graph model, and finally realize the hot update and smooth switching of the new model. The mechanism is helpful to improve the problem of service interruption caused by topology fixation of the traditional scheduling system, so that in the fast iteration scene such as intelligent driving, the updating of the service logic can be completed without stopping the whole pipeline. Further, the system executes intelligent scheduling and task decomposition of resource perception based on the validated new task graph model and multidimensional heterogeneous resource states, so that scheduling rationality of heterogeneous resource environments is improved, and effective utilization of computational resources such as a GPU is promoted. Finally, by executing the tasks which are optimally scheduled in the heterogeneous environment and combining a real-time monitoring and elastic fault tolerance mechanism, the recovery capability of the system under the condition of local faults is enhanced, the overall task delay risk caused by node failure is reduced, and technical support is provided for the continuity, flexibility and stability of the complex data production flow.
The embodiment of the disclosure also provides a computer readable storage medium storing computer executable instructions configured to perform a data task scheduling method as described above.
Embodiments of the present disclosure may be embodied in a software product stored on a storage medium, including one or more instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of a method of embodiments of the present disclosure. The storage medium may be a non-transitory storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, which may store the program code.
The above description and the drawings illustrate embodiments of the disclosure sufficiently to enable those skilled in the art to practice them. Other embodiments may involve structural, logical, electrical, process, and other changes. The embodiments represent only possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in, or substituted for, those of others. Moreover, the terminology used in the present application is for the purpose of describing embodiments only and is not intended to limit the claims. As used in the description of the embodiments and the claims, the singular forms "a," "an," and "the" (the) are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this disclosure is meant to encompass any and all possible combinations of one or more of the associated listed. Furthermore, when used in the present disclosure, the terms "comprises," "comprising," and/or variations thereof, mean that the recited features, integers, steps, operations, elements, and/or components are present, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Without further limitation, an element defined by the phrase "comprising one..+ -." does not exclude the presence of additional identical elements in a process, method or apparatus comprising said element. In this context, each embodiment may be described with emphasis on the differences from the other embodiments, and the same similar parts between the various embodiments may be referred to each other. For the methods, products, etc. disclosed in the embodiments, if they correspond to the method sections disclosed in the embodiments, the description of the method sections may be referred to for relevance.
Those of skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. The skilled artisan may use different methods for each particular application to achieve the described functionality, but such implementation should not be considered to be beyond the scope of the embodiments of the present disclosure. It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the embodiments disclosed herein, the disclosed methods, articles of manufacture (including but not limited to devices, apparatuses, etc.) may be practiced in other ways. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units may be merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form. The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to implement the present embodiment. In addition, each functional unit in the embodiments of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the description corresponding to the flowcharts and block diagrams in the figures, operations or steps corresponding to different blocks may also occur in different orders than that disclosed in the description, and sometimes no specific order exists between different operations or steps. For example, two consecutive operations or steps may actually be performed substantially in parallel, they may sometimes be performed in reverse order, which may be dependent on the functions involved. Each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.