CN112965797B - Combined priority scheduling method for complex tasks under Kubernetes environment - Google Patents
Combined priority scheduling method for complex tasks under Kubernetes environment Download PDFInfo
- Publication number
- CN112965797B CN112965797B CN202110244427.6A CN202110244427A CN112965797B CN 112965797 B CN112965797 B CN 112965797B CN 202110244427 A CN202110244427 A CN 202110244427A CN 112965797 B CN112965797 B CN 112965797B
- Authority
- CN
- China
- Prior art keywords
- task
- priority
- tasks
- pod
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/48—Indexing scheme relating to G06F9/48
- G06F2209/484—Precedence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5021—Priority
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明的Kubernetes环境下面向复杂任务的组合优先级调度方法具体通过以下步骤来实现:a).计算每组任务的实际并行度;b).获取任务关键程度;c).获取用户优先级;d).获取用户的动态优先级;e).计算任务紧急程度;f).并行度和紧急程度的归一化处理;g).求优先级数值;h).pod排序和调度。本发明的组合优先级调度方法,由于在设置优先级时考虑到了任务并行性,可以避免其他任务提前占用节点资源导致并行任务无法获得资源而造成的任务执行失败问题。其次,在设置优先级时考虑到了任务紧急程度,可以保证紧急任务在节点资源不足时对非紧急任务所占用的资源进行抢占,从而成功执行紧急任务。
The combined priority scheduling method for complex tasks in the Kubernetes environment of the present invention is specifically implemented through the following steps: a). Calculate the actual parallelism of each group of tasks; b). Obtain the criticality of the task; c). Obtain the user priority; d). Obtain the dynamic priority of the user; e). Calculate the urgency of the task; f). Normalize the parallelism and urgency; g). Find the priority value; h). Pod sorting and scheduling. The combined priority scheduling method of the present invention, because the task parallelism is considered when setting the priority, can avoid the problem of task execution failure caused by other tasks occupying node resources in advance and the parallel tasks cannot obtain resources. Secondly, the urgency of the task is considered when setting the priority, which can ensure that the urgent task preempts the resources occupied by the non-emergency task when the node resources are insufficient, so that the urgent task can be successfully executed.
Description
技术领域technical field
本发明涉及一种组合优先级调度方法,更具体的说,尤其涉及一种Kubernetes环境下面向复杂任务的组合优先级调度方法。The invention relates to a combined priority scheduling method, more specifically, to a combined priority scheduling method for complex tasks in a Kubernetes environment.
背景技术Background technique
人工智能作为新时期最具发展潜力的技术已经在众多领域得以运用和发展,当前所有的人工智能计算并非都在严格意义上的云平台进行,但云计算仍是人工智能的基础计算平台,也是人工智能的能力集成到千万应用中的便捷途径。云计算是与信息技术、软件、互联网相关的一种服务,通过互联网来提供动态且易扩展的资源,通常这些资源都是虚拟化的资源,云即指这种计算资源共享池。而人工智能不仅丰富了云计算服务的特性,更让云计算服务更加符合业务场景的需求,并进一步解放人力。其中机器学习作为一种实现人工智能的方法,是人工智能技术的重点。针对大规模数据和计算任务,机器学习通常需要成千上万次的迭代计算,因此对云计算资源的需求量非常大,训练与优化模型的时间成本也比较高。为了在有限资源内快速完成机器学习任务,需要合理有效地调度分配云计算资源。As the technology with the most development potential in the new era, artificial intelligence has been used and developed in many fields. At present, not all artificial intelligence computing is carried out on the cloud platform in the strict sense, but cloud computing is still the basic computing platform of artificial intelligence. A convenient way to integrate the capabilities of artificial intelligence into thousands of applications. Cloud computing is a service related to information technology, software, and the Internet. It provides dynamic and easily scalable resources through the Internet. Usually, these resources are virtualized resources. Cloud refers to this computing resource sharing pool. Artificial intelligence not only enriches the characteristics of cloud computing services, but also makes cloud computing services more in line with the needs of business scenarios, and further liberates manpower. Among them, machine learning, as a method to realize artificial intelligence, is the focus of artificial intelligence technology. For large-scale data and computing tasks, machine learning usually requires thousands of iterative calculations, so the demand for cloud computing resources is very large, and the time cost of training and optimizing models is also relatively high. In order to quickly complete machine learning tasks within limited resources, it is necessary to reasonably and effectively schedule and allocate cloud computing resources.
Kubernetes是在云计算领域十分热门的一个开源容器集群管理平台,拥有非常完备的集群管理能力。pod是Kubernetes中能够创建和部署的最小单元,包含了一个或多个容器。任务在Kubernetes中会被映射成为一个或多个pod,由于任务有先后顺序需要设置优先级所以pod同样需要设置优先级。Kubernetes将pod分成了三个QoS(服务质量)等级:Guaranteed:优先级最高;Best Effort:优先级最低;Burstable:优先级介于前两者之间。除了QoS等级,Kubernetes还允许用户自定义pod的优先级。需要在Kubernetes里提交一个优先级的定义,在定义里给属性value赋值。定义完优先级之后,pod即可以声明使用它。Kubernetes is a very popular open source container cluster management platform in the field of cloud computing, with very complete cluster management capabilities. A pod is the smallest unit that can be created and deployed in Kubernetes and contains one or more containers. Tasks will be mapped to one or more pods in Kubernetes. Since tasks have a sequence that needs to be prioritized, pods also need to be prioritized. Kubernetes divides pods into three QoS (Quality of Service) levels: Guaranteed: the highest priority; Best Effort: the lowest priority; Burstable: the priority is between the first two. In addition to QoS levels, Kubernetes also allows users to customize the priority of pods. A priority definition needs to be submitted in Kubernetes, and the attribute value is assigned a value in the definition. Once the priority is defined, the pod can declare its use.
Kubernetes默认的优先级定义中,value需要用户进行赋值,当面对较为复杂的任务时,需要考虑多种影响因子,如何赋予其适当的优先级成为关键。In the default priority definition of Kubernetes, value needs to be assigned by the user. When faced with more complex tasks, various influencing factors need to be considered, and how to give it an appropriate priority becomes the key.
发明内容SUMMARY OF THE INVENTION
本发明为了克服上述技术问题的缺点,提供了一种Kubernetes环境下面向复杂任务的组合优先级调度方法。In order to overcome the shortcomings of the above technical problems, the present invention provides a combined priority scheduling method for complex tasks in a Kubernetes environment.
本发明的Kubernetes环境下面向复杂任务的组合优先级调度方法,设需要通过Kubernetes资源管理平台进行调度的任务分别为task1、task2、…、taskn,共计n个任务;这n个任务又分为q组,1≤q≤n,设第i组包含hi个任务,i≤q,hi≤n,即第i组任务的并行度为hi,第i组内hi个任务分别记为taski1、taski2、…、其特征在于,Kubernetes环境下面向复杂任务的组合优先级调度方法具体通过以下步骤来实现:The present invention provides a combined priority scheduling method for complex tasks under the Kubernetes environment. It is assumed that the tasks that need to be scheduled through the Kubernetes resource management platform are task1, task2, ..., taskn, a total of n tasks; these n tasks are further divided into q group, 1≤q≤n, suppose the i -th group contains hi tasks, i≤q, hi ≤n, that is, the parallelism of the i -th group of tasks is hi , and the hi-tasks in the i -th group are respectively recorded as task i1 , task i2 , …, It is characterized in that the combined priority scheduling method for complex tasks in the Kubernetes environment is specifically implemented through the following steps:
a).计算每组任务的实际并行度;设硬件资源所包含的工作节点的数量为m个,每个工作节点上用于任务计算的CPU核心数为c,则硬件资源所支持m×c的最大任务并发量为m×c;对于每组的任务并行度hi和硬件资源支持的最大任务并发量应当最小值优先,因此,第i组任务的实际并行度Pi通过公式(1)进行求取:a). Calculate the actual parallelism of each group of tasks; set the number of worker nodes contained in hardware resources to be m, and the number of CPU cores used for task calculation on each worker node to be c, then the hardware resources support m×c The maximum task concurrency is m×c; for each group of task parallelism hi and the maximum task concurrency supported by hardware resources, the minimum value should be prioritized. Therefore, the actual parallelism P i of the i -th group of tasks is determined by formula (1) To ask for:
Pi=min(hi,m×c) (1)P i =min( hi ,m×c) (1)
直至所有任务组的实际并行度求取完毕;Until the actual parallelism of all task groups is obtained;
b).获取任务关键程度;对所有任务task1、task2、…、taskn中的关键任务分配高关键系数H,其余任务分配低关键系数W,H>W;对于第i组内的hi个任务taski1、taski2、…、利用选择函数(2)求取第i组内第j个任务taskij的任务关键程度kij:b). Obtain the criticality of the task; assign a high criticality factor H to the critical tasks in all tasks task1, task2, ..., taskn, and assign a low criticality factor W to the rest of the tasks, H>W; for the hi tasks in the i -th group task i1 , task i2 , …, Use the selection function (2) to obtain the task criticality k ij of the j-th task task ij in the i-th group:
kij=choice(H,W) (2)k ij =choice(H,W) (2)
其中,i≤q,j≤hi,H∈N*、W∈N*;Among them, i≤q, j≤hi , H∈N * , W∈N * ;
c).获取用户优先级;为所有任务分配用户优先级U,设第i组内的hi个任务分别为taski1、taski2、…、则其分配的用户优先级依次为Pri1、Pri2、…、利用公式(3)获取第i组内第j个任务taskij的用户优先级:c). Obtain user priority; assign user priority U to all tasks, and set the hi tasks in the i -th group to be task i1 , task i2 , ..., Then its assigned user priorities are Pr i1 , Pr i2 , ..., Use formula (3) to obtain the user priority of the jth task task ij in the ith group:
Uij=Prij (3)U ij =Pr ij (3)
其中,i≤q,j≤hi,Prij∈N*;Among them, i≤q, j≤hi, Pr ij ∈N * ;
d).获取用户的动态优先级;用户的动态优先级D由任务空闲时间L决定的,空闲时间越小的任务动态优先级越高;对于第i组内的hi个任务taski1、taski2、…、利用公式(4)求取第i组内第j个任务taskij的动态优先级Dij:d). Obtain the dynamic priority of the user; the dynamic priority D of the user is determined by the idle time L of the task, and the smaller the idle time is, the higher the dynamic priority of the task ; i2 , …, Use formula (4) to find the dynamic priority D ij of the j-th task task ij in the i-th group:
其中,为向上取整函数,Lij第i组内第j个任务taskij的空闲时间,Lij的取值范围为:1≤Lij≤50;in, In order to round up the function, L ij is the idle time of the j-th task task ij in the i-th group, and the value range of L ij is: 1≤L ij ≤50;
e).计算任务紧急程度,根据公式(5)计算第i组内第j个任务taskij的任务紧急程度Jij:e). Calculate the task urgency, and calculate the task urgency J ij of the j-th task task ij in the i-th group according to formula (5):
Jij=kij+Uij+Dij (5)J ij =k ij +U ij +D ij (5)
f).并行度和紧急程度的归一化处理;设任务并行度的取值范围为[Pmin,Pmax],紧急程度的取值范围为[Jmin,Jmax],第i组任务的实际并行度Pi利用公式(6)进行归一化处理:f). Normalization of parallelism and urgency; set the value range of task parallelism to be [P min , P max ], and the value range of urgency to be [J min , J max ], and the i-th group of tasks The actual parallelism P i of is normalized by formula (6):
Pi-normal=(Pi-Pmin)/(Pmax-Pmin) (6)P i-normal =(P i -P min )/(P max -P min ) (6)
第i组内第j个任务taskij的紧急程度Jij利用公式(7)进行归一化处理:The urgency degree J ij of the j-th task task ij in the i-th group is normalized by formula (7):
Jij-normal=(Jij-Jmin)/(Jmax-Jmin) (7)J ij-normal =(J ij -J min )/(J max -J min ) (7)
g).求优先级数值;一个任务可映射为单个pod或者多个pod,多个pod即pod组,组内每个pod执行一个子任务,任务的优先级映射到Kubernetes中便是单个pod或pod组的优先级,对优先级数值适量扩大,利用公式(8)求取第i组内第j个任务taskij所对应的优先级Vij:g). Find the priority value; a task can be mapped to a single pod or multiple pods, multiple pods are pod groups, each pod in the group executes a subtask, and the priority of the task is mapped to a single pod or a pod in Kubernetes. For the priority of the pod group, the priority value is appropriately expanded, and formula (8) is used to obtain the priority V ij corresponding to the jth task task ij in the ith group:
Vij=k′×(Pi-normal+Jij-normal) (8)V ij =k′×(P i-normal +J ij-normal ) (8)
其中,k′为扩大倍数,Pi-normal为归一化处理后的第i组任务的实际并行度,Jij-normal为归一化处理后的第i组内第j个任务taskij的紧急程度;Among them, k' is the expansion multiple, P i-normal is the actual parallelism of the i-th group of tasks after normalization, J ij-normal is the normalized processing of the j-th task task ij in the i-th group emergency level;
h).pod排序和调度;第i组内第j个任务taskij所对应的单个pod或pod组,按照其对应任务的优先级Vij进行排序,优先级大的排在前面、优先级小的排在后面,排在前面的单个pod或pod组优先调度。h). Pod sorting and scheduling; the single pod or pod group corresponding to the j-th task task ij in the i-th group is sorted according to the priority V ij of its corresponding task, the higher priority ranks first and the lower priority A single pod or group of pods in the front is scheduled first.
本发明的Kubernetes环境下面向复杂任务的组合优先级调度方法,步骤h)中,对于pod组来说,pod组内每个pod对应一个子任务,其优先级设定通过以下步骤来实现:The combined priority scheduling method for complex tasks in the Kubernetes environment of the present invention, in step h), for the pod group, each pod in the pod group corresponds to a subtask, and its priority setting is achieved through the following steps:
h-1).首先根据子任务依赖关系建立组内pod间的有向无环图;h-1). First, establish a directed acyclic graph between pods in the group according to the subtask dependencies;
h-2).在有向无环图中,从任一入度为0的顶点开始,沿着有向边随机寻找一个出度为0的顶点,将出度为0的顶点对应的pod放入栈中;执行步骤h-3);h-2). In a directed acyclic graph, starting from any vertex with in-degree 0, randomly find a vertex with out-degree 0 along the directed edge, and put the pod corresponding to the vertex with out-degree 0. into the stack; execute step h-3);
h-3).返回上一级顶点,若上一级顶点除已放入栈中的顶点之外出度为0,则将此顶点对应的pod放入栈;若上一级顶点除已放入栈中的顶点之外出度不为0,则沿不包含已放入栈中的顶点的有向边寻找下一个出度为0的点,并将出度为0的顶点对应的pod放入栈,重复执行该步骤,直至有向无环图所有顶点对应的pod均放入堆栈中;h-3). Return to the upper-level vertex. If the out-degree of the upper-level vertex is 0 except for the vertices that have been put in the stack, the pod corresponding to this vertex is put into the stack; If the out-degree outside the vertices in the stack is not 0, then look for the next point with out-degree 0 along the directed edge that does not contain the vertices that have been put in the stack, and put the pod corresponding to the vertex with out-degree 0 into the stack , and repeat this step until the pods corresponding to all the vertices of the directed acyclic graph are put into the stack;
h-4).待所有的顶点都进入堆栈中,执行出栈操作,由于栈先进后出的原则,后入栈的pod优先级高于先入栈pod的优先级,得到的pod组内各pod的优先级序列。h-4). After all vertices enter the stack, perform the pop-out operation. Due to the principle of first-in, last-out, the priority of the pods pushed later is higher than the priority of the pods pushed into the stack first, and each pod in the pod group is obtained. priority sequence.
本发明的Kubernetes环境下面向复杂任务的组合优先级调度方法,步骤h)中,存在两个任务的优先级数值相等,则按照如下规则进行排序:The combined priority scheduling method for complex tasks in the Kubernetes environment of the present invention, in step h), if there are two tasks with equal priority values, then they are sorted according to the following rules:
h-1-1).按关键系数进行排序,对于优先级数值相等的两个任务来说,首先比较其关键系数,如果关键系数不同,则将关键系数高的任务对应的单个pod或pod组排在前、将关键系数低的任务对应的单个pod或pod组排在后;如果关键系数相等,则执行步骤h-1-2);h-1-1). Sort by key coefficient. For two tasks with equal priority values, compare their key coefficients first. If the key coefficients are different, assign the task with the higher key coefficient to a single pod or pod group. Rank in the front, and rank the single pod or pod group corresponding to the task with the low key factor in the back; if the key factor is equal, perform step h-1-2);
h-1-2).按用户优先级进行排序,对于优先级数值、关键系数均相等的两个任务来说,则比较其用户优先级,如果用户优先级不同,则将用户优先级高的任务对应的单个pod或pod组排在前、将用户优先级低的任务对应的单个pod或pod组排在后;如果用户优先级相等,则执行步骤h-1-3);h-1-2). Sort by user priority. For two tasks with equal priority values and key coefficients, compare their user priorities. If the user priorities are different, assign the higher user priority to the task. The single pod or pod group corresponding to the task is ranked first, and the single pod or pod group corresponding to the task with the lower user priority is ranked last; if the user priority is equal, perform step h-1-3);
h-1-3).按动态优先级进行排序,对于优先级数值、关键系数和用户优先级均相等的两个任务来说,则比较其动态优先级,如果动态优先级不同,则将动态优先级高的任务对应的单个pod或pod组排在前、将动态优先级的任务对应的单个pod或pod组排在后;如果动态优先级相等,则两任务一前一后随机排序。h-1-3). Sort by dynamic priority. For two tasks with equal priority value, key coefficient and user priority, compare their dynamic priorities. If the dynamic priorities are different, the dynamic The single pod or pod group corresponding to the task with high priority is ranked first, and the single pod or pod group corresponding to the task with dynamic priority is ranked last; if the dynamic priority is equal, the two tasks are sorted randomly one after the other.
本发明的Kubernetes环境下面向复杂任务的组合优先级调度方法,步骤g)中所述的扩大倍数k′为1000000。In the combined priority scheduling method for complex tasks in the Kubernetes environment of the present invention, the expansion multiple k' described in step g) is 1,000,000.
本发明的有益效果是:本发明的Kubernetes环境下面向复杂任务的组合优先级调度方法,当面向例如机器学习等复杂任务时,由于在设置优先级时考虑到了任务并行性,可以避免其他任务提前占用节点资源导致并行任务无法获得资源而造成的任务执行失败问题。其次,在设置优先级时考虑到了任务紧急程度,可以保证紧急任务在节点资源不足时对非紧急任务所占用的资源进行抢占,从而成功执行紧急任务。综合考虑了以上两点的优先级设置方法,可以在复杂任务调度节点资源时有效提高任务执行成功率。另外在面向机器学习任务的组调度时,另一层次的优先级设置方法解决了组内pod存在依赖关系的问题。The beneficial effects of the present invention are: the combined priority scheduling method for complex tasks in the Kubernetes environment of the present invention, when facing complex tasks such as machine learning, because the task parallelism is considered when setting the priority, other tasks can be avoided in advance. The problem of task execution failure caused by the occupation of node resources and the inability of parallel tasks to obtain resources. Secondly, the urgency of the task is considered when setting the priority, which can ensure that the urgent task preempts the resources occupied by the non-emergency task when the node resources are insufficient, so that the urgent task can be successfully executed. Taking the above two points into consideration, the priority setting method can effectively improve the success rate of task execution when scheduling node resources for complex tasks. In addition, in the group scheduling for machine learning tasks, another level of priority setting method solves the problem of dependencies between pods in the group.
附图说明Description of drawings
图1为本发明中Kubernetes中任务调度映射过程图;Fig. 1 is the task scheduling mapping process diagram in Kubernetes in the present invention;
图2为本发明中任务整体结构图;Fig. 2 is the overall structure diagram of task in the present invention;
图3为本发明中的任务并行图,A组任务包括task1至task3,B组任务包括task4至task8;Fig. 3 is the task parallel diagram in the present invention, A group task includes task1 to task3, B group task includes task4 to task8;
图4为本发明中的pod组内依赖关系有向无环图。FIG. 4 is a directed acyclic graph of dependencies within a pod group in the present invention.
具体实施方式Detailed ways
下面结合附图与实施例对本发明作进一步说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.
任务并行度:用于评价某一时刻并行执行的任务数。用户所指定的多个任务能否并发执行,依赖于工作节点的数量及每个工作节点上用于任务计算的CPU核心数。记当前执行任务的工作节点数为m,每个工作节点上用于任务计算的CPU核心数为c,那么硬件资源支持的最大任务并发量为m×c,取值范围为正整数。设任务的并行度为h,任务的并行度取决于串行任务中,每个任务的子任务数,取值范围为正整数。对于任务并行度h和硬件资源支持的最大任务并发量m×c应当小值优先,该小值为任务的实际并行度P。Task parallelism: used to evaluate the number of tasks executed in parallel at a certain time. Whether multiple tasks specified by the user can be executed concurrently depends on the number of worker nodes and the number of CPU cores used for task computation on each worker node. Note that the number of worker nodes currently executing tasks is m, and the number of CPU cores used for task calculation on each worker node is c, then the maximum task concurrency supported by hardware resources is m×c, and the value range is a positive integer. Let the degree of parallelism of the task be h, the degree of parallelism of the task depends on the number of subtasks of each task in the serial task, and the value range is a positive integer. For the task parallelism h and the maximum task concurrency m×c supported by hardware resources, the smaller value should be preferred, and the smaller value is the actual parallelism P of the task.
任务紧急程度:每个任务的任务紧急度为J,该任务紧急程度是一个固定优先级F和动态优先级D的结合,其中固定优先级F是由任务关键程度k和用户优先级U决定;动态优先级D是由任务空闲时间L决定,空闲时间越小的任务动态优先级越高。给关键任务集分配高关键系数H,其余任务分配低关键系数W,关键系数的取值范围为正整数,且必须H>W。给一批任务中的每个任务分配一个唯一的用户优先级U,U的取值为正整数,一组任务的用户优先级可以从1开始依次递增分配。Task urgency: The task urgency of each task is J, and the task urgency is a combination of a fixed priority F and a dynamic priority D, where the fixed priority F is determined by the task criticality k and user priority U; The dynamic priority D is determined by the task idle time L. The smaller the idle time, the higher the dynamic priority of the task. A high critical coefficient H is assigned to the key task set, and a low critical coefficient W is assigned to the rest of the tasks. The value range of the critical coefficient is a positive integer, and must be H>W. A unique user priority U is assigned to each task in a batch of tasks. The value of U is a positive integer, and the user priority of a group of tasks can be assigned sequentially from 1.
如图1所示,给出了本发明中Kubernetes中任务调度映射过程图,pod是Kubernetes中能够创建和部署的最小单元,包含了一个或多个容器,任务在Kubernetes中会被映射成为一个pod或pod组。As shown in Figure 1, the process diagram of task scheduling and mapping in Kubernetes in the present invention is given. A pod is the smallest unit that can be created and deployed in Kubernetes, including one or more containers, and a task will be mapped into a pod in Kubernetes. or pod groups.
如图2所示,给出了本发明中任务整体结构图,其中task1、task2、task3为一组(记为第1组任务),并行度为3;task4、task5、task6、task7、task8为一组(记为第2组任务),并行度为5。如图2,共有3个节点,每个节点核心数为2。As shown in Fig. 2, the overall structure diagram of tasks in the present invention is given, wherein task1, task2, task3 are a group (referred to as the first group of tasks), and the parallelism is 3; task4, task5, task6, task7, task8 are One group (denoted as the second group of tasks), the degree of parallelism is 5. As shown in Figure 2, there are 3 nodes in total, and each node has 2 cores.
利用公式(1)求取第1组任务的实际并行度P1:Use formula (1) to find the actual parallelism P 1 of the first group of tasks:
P1=min(h1,m×c)=min(3,3×2)=3P 1 =min(h 1 , m×c)=min(3, 3×2)=3
同理,利用公式(1)求取第2组任务的实际并行度P2:In the same way, the actual parallelism degree P 2 of the second group of tasks can be obtained by using formula (1):
P2=min(h2,m×c)=min(5,3×2)=5P 2 =min(h 2 ,m×c)=min(5,3×2)=5
如表1所示,第1组并行的3个任务task1、task2、task3,用户优先级分别为1、2、3,设task1、task3为关键任务集配置高关键系数10,task2为非关键任务集配置低关键系数5,其通过系统获得的空闲时间分别为6、3、2。在第2组并行的5个任务task4、task5、task6、task7、task8中,用户优先级分别为1、2、3、4、5。设task4、task5为非关键任务集配置低关键系数5,task6、task7、task8为关键任务集配置高关键系数10,其通过系统获得的空闲时间分别为4、5、3、2、2。As shown in Table 1, the three parallel tasks task1, task2, and task3 in the first group have user priorities of 1, 2, and 3, respectively. Let task1 and task3 be the key task set with a high critical factor of 10, and task2 is a non-critical task. The set is configured with a low criticality factor of 5, and the idle time obtained by the system is 6, 3, and 2, respectively. In the second group of five parallel tasks task4, task5, task6, task7, and task8, the user priorities are 1, 2, 3, 4, and 5, respectively. Suppose task4 and task5 are non-critical task sets with
表1Table 1
通过公式(4)可计算第一组中3个任务task1、task2、task3的动态优先级D11、D12、D13分别为17、34、50,可计算出第2组中5个任务task4、task5、task6、task7、task8的动态优先级D21、D22、D23、D24、D25分别为25、20、34、50、50。By formula (4), the dynamic priorities D 11 , D 12 , and D 13 of the three tasks task1, task2, and task3 in the first group can be calculated to be 17, 34, and 50, respectively, and the five tasks task4 in the second group can be calculated. The dynamic priorities D 21 , D 22 , D 23 , D 24 , and D 25 of task5, task6, task7, and task8 are 25, 20, 34, 50, and 50, respectively.
然后,根据公式(5)可计算出第1组中3个任务task1、task2、task3的任务紧急程度J11、J12、J13分别为24、41、55,可计算出第2组中5个任务task4、task5、task6、task7、task8的任务紧急程度J21、J22、J23、J24、J25分别为31、27、47、64、65。Then, according to formula (5), the task urgency levels J 11 , J 12 , and J 13 of the three tasks task1, task2, and task3 in the first group can be calculated to be 24, 41, and 55, respectively, and 5 in the second group can be calculated. The task urgency levels J 21 , J 22 , J 23 , J 24 , and J 25 of the tasks task4, task5, task6, task7, and task8 are 31, 27, 47, 64, and 65, respectively.
对并行度和紧急度进行归一化处理,设并行度的取值范围为Pmax=9和Pmin=1,紧急程度的取值范围为Jmin=10和Jmax=70,利用公式(6)可求得task1至task3的并行度归一化值P1-normal为0.25,task4至task8的并行度归一化值P2-normal为0.5。Normalize the degree of parallelism and the degree of urgency, set the value range of the degree of parallelism to be P max =9 and P min =1, and the value range of the degree of urgency to be J min =10 and J max =70, using the formula ( 6) It can be obtained that the normalized value P 1-normal of the parallelism of task1 to task3 is 0.25, and the normalized value of parallelism P2-normal of task4 to task8 is 0.5.
利用公式(7)可求得task1至task3的紧急程度归一化值J11-normal、J12-normal、J13-normal分别为0.233333、0.516667、0.750,task4至task8的紧急程度归一化值J21-normal、J22-normal、J23-normal、J24-normal、J25-normal分别为0.350、0.316667、0.616667、0.90、0.916667。Using formula (7), the urgency normalized values J 11-normal , J 12-normal , and J 13-normal of task1 to task3 can be obtained as 0.233333, 0.516667, and 0.750, respectively, and the urgency normalized values of task4 to task8 J 21-normal , J 22-normal , J 23-normal , J 24-normal , and J 25-normal are 0.350, 0.316667, 0.616667, 0.90, and 0.916667, respectively.
利用公式(8)可求得task1至task8的优先级V11、V12、V13、V21、V22、V23、V24、V25分别为0.483333×106、0.766667×106、1.0×106、0.85×106、0.816667×106、1.116667×106、1.4×106、1.416667×106,根据优先级数值的大小对任务的排序为:task8、task7、task6、task3、task4、task5、task2、task1。Using formula (8), the priorities V 11 , V 12 , V 13 , V 21 , V 22 , V 23 , V 24 and V 25 of task1 to task8 can be obtained as 0.483333×10 6 , 0.766667×10 6 , and 1.0 respectively ×10 6 , 0.85×10 6 , 0.816667×10 6 , 1.116667×10 6 , 1.4×10 6 , 1.416667×10 6 , the tasks are sorted according to the size of the priority value: task8, task7, task6, task3, task4 , task5, task2, task1.
至此,8个任务在Kubernetes上映射成为8个pod,8个pod将会按照此序列被依次调度到满足资源需求的工作节点上。So far, 8 tasks are mapped to 8 pods on Kubernetes, and the 8 pods will be scheduled to the worker nodes that meet the resource requirements in sequence according to this sequence.
在面向复杂任务时例如机器学习任务时,上文中的8个task中每个task会被映射为多个pod,每个pod对应task中的一个子任务,即一个task会映射为一个拥有多个pod的pod组。设优先级最高的任务task8在执行时需要运行5个pod,即pod 8为pod组,其由5个pod组成。如图4所示,给出了本发明中的pod组内依赖关系有向无环图。When facing complex tasks such as machine learning tasks, each of the eight tasks above will be mapped to multiple pods, and each pod corresponds to a subtask in the task, that is, a task will be mapped to a task with multiple pods. The pod group for the pod. It is assumed that task8 with the highest priority needs to run 5 pods during execution, that is,
接下来单独考虑pod 8组内5个pod的优先级问题。此时该组内5个pod存在依赖关系,某些pod将会作为另一些pod的前提条件。如图4所示,默认的pod序列为:Next, consider the priority of the 5 pods in the
pod1→pod2→pod3→pod4→pod5pod1→pod2→pod3→pod4→pod5
根据有向图计算它的拓扑序列。首先选取入度为0的顶点1为起点,沿任一有向边寻找出度为0的顶点,例如沿顶点1、2、3、4找到顶点4并放入栈中;返回上一级顶点3,发现顶点3除了指向4的有向边之外已出度为0,于是将3放入栈中;返回顶点3的上一级顶点2,发现顶点2出度不为0,由于顶点3已经放入栈中,于是沿有向边到达顶点5,又因顶点4已放入栈中,所以顶点5出度为0,将顶点5放入栈中;再次返回上一级顶点2,此时顶点2的出度为0,放入顶点2栈中;返回上一级顶点1,此时顶点1出度为0,将顶点1放入栈中。至此所有顶点已按4、3、5、2、1的顺序全部放入栈中,根据栈先进后出的原则,出栈的顶点顺序为1、2、5、3、4,为该有向图对应的拓扑序列。即优先级序列为:Calculate its topological sequence from a directed graph. First,
pod1→pod2→pod5→pod3→pod4pod1→pod2→pod5→pod3→pod4
因此需按照该顺序为组内pod赋予由高至低的优先级。首先自定义pod优先级为a、b、c、d、e,在podGroup-status的yaml文件中通过pod.spec.PriorityClassName指定要使用的优先级名字完成声明使用。Therefore, the pods in the group should be given priority from high to low in this order. First, customize the pod priority as a, b, c, d, and e. In the yaml file of podGroup-status, specify the priority name to be used through pod.spec.PriorityClassName to complete the declaration.
该优先级不参与并行度与紧急程度优先级设置的过程,仅适用于完成pod组之间的优先级调度后,pod组内存在依赖关系时的优先级排序。This priority does not participate in the process of setting the parallelism and urgency priority, and is only applicable to the priority sorting when there is a dependency in the pod group after the priority scheduling between the pod groups is completed.
结合具体任务可以看出,由于在设置优先级时考虑到了任务并行度和任务紧急程度,可以使优先级的设定更加细化以及更加规范合理,在面对并行任务和紧急任务有资源需求时可以有效提高任务执行成功率。另外在面向机器学习任务的组调度时,另一层次的优先级设置方法解决了组内pod存在依赖关系的问题。Combining with the specific tasks, it can be seen that since the task parallelism and task urgency are taken into account when setting the priority, the priority setting can be made more detailed and more standardized and reasonable. When there are resource requirements for parallel tasks and urgent tasks It can effectively improve the success rate of task execution. In addition, in the group scheduling for machine learning tasks, another level of priority setting method solves the problem of dependencies between pods in the group.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110244427.6A CN112965797B (en) | 2021-03-05 | 2021-03-05 | Combined priority scheduling method for complex tasks under Kubernetes environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110244427.6A CN112965797B (en) | 2021-03-05 | 2021-03-05 | Combined priority scheduling method for complex tasks under Kubernetes environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112965797A CN112965797A (en) | 2021-06-15 |
CN112965797B true CN112965797B (en) | 2022-02-22 |
Family
ID=76276619
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110244427.6A Active CN112965797B (en) | 2021-03-05 | 2021-03-05 | Combined priority scheduling method for complex tasks under Kubernetes environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112965797B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113254179B (en) * | 2021-06-03 | 2022-03-01 | 核工业理化工程研究院 | Job scheduling method, system, terminal and storage medium based on high response ratio |
CN114444700A (en) * | 2021-12-31 | 2022-05-06 | 华南师范大学 | Quantum cloud computing platform job scheduling and resource allocation method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271233A (en) * | 2018-07-25 | 2019-01-25 | 上海数耕智能科技有限公司 | The implementation method of Hadoop cluster is set up based on Kubernetes |
CN111367644A (en) * | 2020-03-17 | 2020-07-03 | 中国科学技术大学 | Task scheduling method and device for heterogeneous fusion system |
CN111858069A (en) * | 2020-08-03 | 2020-10-30 | 网易(杭州)网络有限公司 | Cluster resource scheduling method and device and electronic equipment |
CN111930525A (en) * | 2020-10-10 | 2020-11-13 | 北京世纪好未来教育科技有限公司 | GPU resource use method, electronic device and computer readable medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7567360B2 (en) * | 2003-03-27 | 2009-07-28 | Canon Kabushiki Kaisha | Image forming system, method and program of controlling image forming system, and storage medium |
-
2021
- 2021-03-05 CN CN202110244427.6A patent/CN112965797B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271233A (en) * | 2018-07-25 | 2019-01-25 | 上海数耕智能科技有限公司 | The implementation method of Hadoop cluster is set up based on Kubernetes |
CN111367644A (en) * | 2020-03-17 | 2020-07-03 | 中国科学技术大学 | Task scheduling method and device for heterogeneous fusion system |
CN111858069A (en) * | 2020-08-03 | 2020-10-30 | 网易(杭州)网络有限公司 | Cluster resource scheduling method and device and electronic equipment |
CN111930525A (en) * | 2020-10-10 | 2020-11-13 | 北京世纪好未来教育科技有限公司 | GPU resource use method, electronic device and computer readable medium |
Non-Patent Citations (2)
Title |
---|
基于Kubernetes容器集群资源调度策略研究;马希琳;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20191215(第12期);全文 * |
基于虚拟化平台的Hadoop应用I/O性能分析;郭梦影等;《计算机研究与发展》;20151215;第52卷(第S2期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112965797A (en) | 2021-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111768006B (en) | Training method, device, equipment and storage medium for artificial intelligent model | |
Zhou et al. | A list scheduling algorithm for heterogeneous systems based on a critical node cost table and pessimistic cost table | |
CN107015856A (en) | Task scheduling approach generation method and device under cloud environment in scientific workflow | |
CN112114950A (en) | Task scheduling method and device and cluster management system | |
CN117687774B (en) | Task model training method for computing power scheduling and computing power scheduling method and system | |
CN112965797B (en) | Combined priority scheduling method for complex tasks under Kubernetes environment | |
CN111913800B (en) | Resource allocation method for optimizing cost of micro-service in cloud based on L-ACO | |
Muthusamy et al. | Cluster-based task scheduling using K-means clustering for load balancing in cloud datacenters | |
CN108427602B (en) | A collaborative scheduling method and device for distributed computing tasks | |
CN112114973A (en) | Data processing method and device | |
Li et al. | Cost-efficient fault-tolerant workflow scheduling for deadline-constrained microservice-based applications in clouds | |
CN106201681B (en) | Task scheduling method based on pre-release resource list under Hadoop platform | |
CN116701001A (en) | Target task allocation method and device, electronic equipment and storage medium | |
CN110008002B (en) | Job scheduling method, device, terminal and medium based on stable distribution probability | |
CN115562833A (en) | Workflow optimization scheduling method based on improved goblet sea squirt algorithm | |
CN112363819B (en) | Big data task dynamic arrangement scheduling method and device and computing equipment | |
CN113608858A (en) | MapReduce architecture-based block task execution system for data synchronization | |
CN110084507A (en) | The scientific workflow method for optimizing scheduling of perception is classified under cloud computing environment | |
CN118569358A (en) | Distributed computation scheduling method, device and equipment for model and storage medium | |
CN118277062A (en) | Job scheduling method, device, computer equipment and storage medium | |
Gu et al. | Maximizing workflow throughput for streaming applications in distributed environments | |
Zhou et al. | Performance analysis of scheduling algorithms for dynamic workflow applications | |
Cao et al. | A fault-tolerant workflow mapping algorithm under end-to-end delay constraint | |
Burkimsher | Fair, responsive scheduling of engineering workflows on computing grids | |
Yang et al. | Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |