WO2008148625A1

WO2008148625A1 - Method and device for scheduling a predictable operation of an algorithm on a multi-core processor

Info

Publication number: WO2008148625A1
Application number: PCT/EP2008/055910
Authority: WO
Inventors: Andrey Nechypurenko; Egon Wuchner
Original assignee: Siemens AG; Siemens Corp
Current assignee: Siemens AG; Siemens Corp
Priority date: 2007-06-05
Filing date: 2008-05-14
Publication date: 2008-12-11
Anticipated expiration: 2009-12-05

Abstract

The invention describes a method for scheduling a predictable operation of an algorithm (ALG) on a multi-core processor comprising a plurality of parallel working cores (P1,..., P8), comprising the steps of - creating a model of the algorithm (ALG), thereby identifying tasks (t1,...,t22) of the algorithm (ALG) and at least one characteristic of each of the tasks (t1,..., t22); assigning each identified task (t1,...,t22) according to its at least one characteristic to at least some of the plurality of cores (P1,..., P8) of the multi-core processor; and determining a starting point in time for each of the tasks (t1,...,t22) of the algorithm (ALG).

Description

Method and device for scheduling a predictable operation of an algorithm on a multi-core processor

The invention relates to a method for scheduling a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores. The invention further relates to a device for creating a schedule for a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores. The invention further relates to a computer program product.

Normally, an algorithm comprises a plurality of different tasks, each task having a specific duration. Furthermore, there are task dependencies between some of the tasks. These tasks are called interdependent tasks. The operation time of an algorithm depends on many aspects, especially the task dependencies, task durations and a so-called domain-motivated explicit mapping of tasks or task groups to certain cores of the multi-core processor. Improving the operation time of an algorithm implies a better data-throughput or a better task scalability in general. Especially, real-time or life- processing algorithms which are used for medical instruments need feasible run-time predictions with regard to their operation time.

Even in case that run-time predictions are not needed the dynamic scheduling of an algorithm with interdependent tasks has to be based on their task dependencies and durations. The tasks of the algorithm have to be executed in sequence of their specified dependencies. In addition, there has to be a certain kind of fairness with respect to all dispatched algorithms processed by the multi-core processor at a certain time. By executing a plurality of algorithms in parallel, it has to be guaranteed that no previous algorithm starves out. On the contrary, an algorithm should be able to be completed before a new or the same algorithm starts. By now, when using so-called symmetric multi-core processors (as provided by INTEL or AMD) task scheduling is under the control of the operating system. A symmetric multi-core proc- essor means that all cores of the multi-core processor have the same capabilities. Even real-time operating systems, like QNX, schedule their execution units, i.e. tasks, threads or processes, in a transparent way according to priorities, task states and queues. Executing an algorithm with a real-time operating system therefore makes it necessary to assign a priority to each task of the algorithm. Several approaches are known how to map task to priorities in order to derive a feasible and predictable algorithm duration. Unfortunately, such a mapping has to be done manually by the programmer lev- eraging his experience and knowledge. Furthermore, this mechanism is hardly applicable to algorithms having interdependent tasks. A further problem concerns task starvation and priority inversion, among others.

Even when using so-called asymmetric multi-core processors or a combination of different multi-core units, like the so- called IBM Cell Broadcasting Engine (CBE) and graphical processing units (GPUs) , the scheduling of the tasks of an algorithm has to be done manually. The scheduling result has to be programmed explicitly. Furthermore, dynamic scheduling of algorithms on a general purpose operation system (e.g. Windows, LINUX) does not provide any guarantee with respect to completion order or completion time.

Especially in case of medical applications consisting of interdependent tasks all of the above mentioned suggestions how to treat algorithms with a plurality of independent tasks are not suitable since a predictable operation of the algorithm is necessary.

It is therefore an object of the present invention to provide a method for scheduling a predictable operation of an algorithm on a multi-core processor. It is a further object of the present invention to provide a device for creating a schedule for a predictable operation of an algorithm on a multi-core processor. Furthermore, a computer program product directly loadable into the internal memory of a digital com- puter has to be provided.

These objects are solved by a method according to claim 1, a computer program product according to claim 11 and a device according to claim 12. Preferred embodiments of the invention are set out in the dependent claims.

A method for scheduling a predictable operation of an algorithm on a multi-core processor, the multi-core processor comprising a plurality of parallel working cores, comprises the steps of creating a model of the algorithm, thereby identifying tasks of the algorithm and at least one characteristic of each of the tasks; assigning each identified task according to its at least one characteristic to at least some of the plurality of cores of the multi-core processor; and determining a starting point in time for each of the tasks of the algorithm.

Scheduling a predictable operation means that the overall time of the algorithm, the needed number of cores for executing the algorithm and the resulting core utilization of the number of cores is identical with each execution of the algorithm.

The invention is based on the knowledge that the operation time of an algorithm is highly dependent on how tasks of an algorithm are mapped to the cores of a multi-core processor. Creating a model of the algorithm allows identifying paral- lelization options, parallel tasks and the data exchange pattern between these tasks in order to exploit the power of the multi-core processor. The schedule for the algorithm allows a predictable operation regardless if the algorithm is executed on a multi-core processor under control of a general purpose operating system or a real-time operating system. Therefore, the method according to the invention allows providing algorithms which can be used for critical applications, e.g. medical measurements. An advantage over the prior art is that the method according to the invention can be carried out automatically by using a computer system instead of a manual or brute-force approach. Therefore, the method according to the invention may be used for algorithms with changing tasks, changing dependencies and durations because scheduling a new predictable operation of the algorithm can be done very fast and even dynamically.

According to a further improvement of the invention, the at least one characteristic of each of the tasks comprises one or more of the following parameters: dependencies between the tasks; duration of a task; data exchange between the tasks. By considering these characteristics of each of the tasks it is possible to provide a schedule for a predictable operation of the algorithm.

According to a further embodiment, the step for creating a model of the algorithm comprises identifying properties of the algorithm. The properties of the algorithm to be consid- ered comprise one or more of the following parameters: preferred or needed cores for a specific task; a group of tasks which has to be executed by one specific core. The preferred or needed cores for a specific task are called core candidates per task. A group of tasks which has to be executed by one specific core is named a group of tasks to be deployed to one core.

According to a further embodiment, the number of cores to be used for executing the algorithm is predetermined according to an overall time-optimization of the algorithm. Additionally or alternatively, the number of cores to be used for executing the algorithm is predetermined according to an optimized core utilization of at least one of the cores of the multi-core processor. Which of these two side conditions will be used for assigning each identified task to the cores of the multi-core processor depends on the amount of cores of the multi-core processor to be used, the number of algorithms which will be executed in parallel on the processor and the operation system for controlling the multi-core processor.

The step of creating the model of the algorithm may be carried out based on a computer. Alternatively, it is also pos- sible that the model of the algorithm is created by the programmer. The step of creating the model of the algorithm comprises identifying tasks, their dependencies, data exchanges, task durations, core candidates per task and task groups to be deployed to one core. With this information an optimized algorithm can be exploited to suggest the right scheduling of tasks to available cores.

The step of assigning the identified tasks to at least some of the amount of cores of the multi-core processor is carried out by an optimization algorithm or by a heuristics. Thereby, use of known optimization algorithms and heuristics is possible, like the optimization methods or heuristics "Simplex", "Simulated Annealing" or "Best-First Searches".

According to a further embodiment the step of assigning the identified tasks to at least some of the amount of cores of the multi-core processor is carried out dependent on the properties of an operating system controlling the multi-core processor. Use of a general purpose operating system or a real-time operating system can be made to provide an optimized schedule of the algorithm.

The steps of assigning the identified tasks to at least some of the amount of cores of the multi-core processor and of de- termining the critical date for each of the tasks of the algorithm is carried out during operation of the multi-core processor to provide adaptive dynamic scheduling of the algorithm. With other words, providing the schedule of the algo- rithm can be done during executing the algorithm thereby considering possible other algorithms being executed on the multi-core processor.

According to a second aspect of the invention a computer program product directly loadable into the internal memory of a digital computer is suggested, the computer program product comprising software code portions for performing the steps of the method according to the invention when said product is run on a computer.

According to a third aspect of the invention a device for creating a schedule for a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores is suggested. The device comprises a first means for creating a model of the algorithm, thereby identifying tasks of the algorithm and at least one characteristic of each of the tasks; a second means for assigning each identified task according to its at least one character- istic to at least some of the plurality of cores of the multi-core processor; and a third means for determining a starting point in time for each of the tasks of the algorithm.

A device according to the invention has the same advantages as pointed out in connection with the described method.

Furthermore, the device according to the invention comprises further means for executing the method steps as set out above.

The invention will be described in more detail by reference to the figures.

Fig. 1 shows a simplified already analyzed model of an algorithm containing a plurality of tasks, and Fig. 2 shows a schedule of the tasks of the algorithm of

Fig. 1 assigned to a plurality of cores of a multi- core processor.

Fig. 1 shows a simplified model of an algorithm ALG consisting of tasks tO to t22. Each of the tasks tθ,..., t22 has a specific duration which is indicated by reference numeral "delta". For instance, task tO has a duration of delta = 200 ms . Task t7 has a duration of delta = 50 ms . Furthermore, Fig. 1 shows a dependency between some of the tasks. For example the dependency between task tO and task t7 is indicated with dl . This means, task t7 has to wait for some input information from task tO. Accordingly, tasks t6, t8 and tlO need input information from task t7. The dependencies are outlined with d4, d5 and d6. In an appropriate manner further dependencies dl to d32 between further interdependent tasks are indicated.

Fig. 1 shows the algorithm after performing the step of cre- ating a model of the algorithm, thereby identifying the tasks tl,..., t22 of the algorithm ALG and identifying at least one characteristic of each of the tasks. The characteristics used for optimization and shown in Fig. 1 are the duration of each of the tasks and dependencies to other tasks. However, addi- tional or alternative characteristics could be used as well.

In general, the duration of the algorithm ALG depends on the task dependencies, on the task-to-core deployment and on task-to-priority mapping on the symmetric multi-core proces- sors, if priorities are assigned to each task.

To provide a predictable time-optimized and core-bound operation of the algorithm ALG a schedule of the algorithm ALG will be determined automatically by optimization algorithms or heuristics. Applying a time-optimization to the algorithm shown in Fig. 1 by using e.g. eight cores of a multi-core processor results in an overall duration of 1850 ms . The resulting task-to-core scheduling is outlined in Fig. 2. As can be seen from Fig. 2 each of the tasks tO,..., t22 is assigned to a specific core Pl,..., P8 : Task t22 is assigned to core Pl. Tasks tl, tl9 and t4 are assigned to core P4. Tasks t5, tl8 and til are assigned to core P5. Tasks tO, t6, tl5 and tl7 are assigned to core P6. Tasks t7, t8, t9, tl4 and t2 are assigned to core P7. Tasks tl3 and t21 are assigned to core P2. Tasks tlβ, t20 and t3 are assigned to core P3. Tasks tlO and tl2 are assigned to core P8.

Furthermore, the dependencies dl,..., d32 which are outlined in Fig. 1 are shown in Fig. 2, too. As can be seen from Fig. 2, the critical path, i.e. the path which determines the overall time of the algorithm, comprises tasks tO (executed by core P6) , task tl (executed by core P4), task t22 (executed by core Pl) and task t3 (executed by core P3) . By adding all of the time durations of these tasks tO, tl, t22 and t3 the overall time of 1850 ms of the algorithm is obtained. Since task t3 is dependent from task t22, task t22 is dependent from task tl and task tl is dependent from task tO a reduc- tion of time compared to the overall time of 1850 ms is not possible. On the other hand, if the execution of these four tasks tO, tl, t22 and t3 is not optimized a prolongation in the overall time of execution for the algorithm ALG will be the result.

Fig. 2 shows which of the tasks have to be executed on which of the cores of the multi-core processor. It is possible that all of the processors execute some of the tasks in parallel. Nevertheless, there may be an interruption, i.e. a pause, be- tween two consecutive tasks running on the same core. This depends on the determination of a starting point in time for each of the tasks of the algorithm.

A person skilled in the art will recognize that any change to the algorithm structure, e.g. any change with regard to task durations or changing of task dependencies or the available number of cores, results in triggering the time-optimization anew to determine the respective core-bound scheduling. Applying optimization techniques to determine an explicit deployment and scheduling of tasks to cores exploits the increasing number of cores of multi-core processors. This mechanism allows adapting the computing power of multiple cores of a processor to the specific needs of a complex algorithm of interdependent tasks. The scheduling solution mechanism proposed allows predicting best or acceptable operation times, the needed number of cores and the resulting core utilization.

Time-optimized and core-bound scheduling solutions per algorithm are applicable to any multi-core processors and even scheduling policies of a real-time or even a general-purpose operating system. Depending on the current system and core load, respectively, a scheduling policy of an operating system can vary the number of available cores it is able to provide to one of more parallel executed algorithms. The proposed scheduling solution mechanism either results in better time-optimizations and less core utilization or vice versa, depending on the number of available cores. This way a general purpose operating system can do dynamic scheduling of many algorithms or applications adaptively and can keep certain deterministic properties like non-starvation or the com- pletion order of running algorithms with respect to their starting times, durations and cores of the multi-core processor .

By modelling and specifying, respectively, the structure of an algorithm in terms of its tasks, task dependencies, task durations, core candidates of each tasks and one-core task groups (as pointed out in Fig. 1) time-optimization techniques allow suggesting the appropriate scheduling solutions in an automated way. Automated scheduling suggestions provide a necessary flexibility in figuring out the right core- scheduling in case of varying algorithm characteristics like task dependencies or single task durations. In addition to a real-time operating system time-optimized and core-bound scheduling solutions can be exploited even by the scheduling policy of a general purpose operating system in an adaptive way. Depending on the current system load and the remaining number of available cores, the scheduling policy can provide a starting algorithm with a maximum number of cores to use. This means, if a multi-core processor comprises for example of twelve cores only eight of them may be scheduled to a specific algorithm to ensure a predictable opera- tion of the algorithm.

The more cores an algorithm is able to use, the better overall time-optimization is possible. But the core utilization is certainly not on maximum level. In contrast, the fewer cores the algorithm is allowed to use, the better core utilization is possible. However, there is the drawback of longer overall operation times despite a time-optimization.

This mechanism of adaptively allocating a number of cores to a starting algorithm and assigning its task to these reserved cores guarantees non-starvation and deterministic operation time of the algorithm. This applies even in time of task duration times differing from their values used for the time- optimization .

Time-optimized and core-bound scheduling policies are applicable to the whole range of multi-core systems either asymmetric, symmetric and clusters of different multi-core processors .

Asymmetric multi-core processor

An asymmetric multi-core processor consists of one master core (PPE) and for example eight worker cores (SPEs) . An ex- ample an asymmetric multi-core processor is the so-called IBM cell in which the master core comprises the operating system, whereas the worker cores are not under control of the operating system. Time-optimized and core-bound algorithm schedul- ing solutions are exploited for a given number of cores. The algorithm can be scheduled programmatically in an explicit way according to its time-optimized and SPE-bound computed scheduling solution.

Symmetric multi-core processors

By using a real-time operating system it is possible to leverage the predictability property of time-based, i.e. time- optimized, and core-bound scheduling.

On a general purpose operating system, such as Windows or LINUX, time-optimized and core-bound scheduling solutions can be applied in an adaptive way. By assuming a high number of available cores, the suggested scheduling solution provides the best-possible optimized overall operation time with probably limited core utilization. On the other hand, reducing the number of cores results in better core utilization and slower optimized algorithm times.

For instance, the operating system on an 80-core processor could be more generous with respect to available cores per algorithm in case of fewer running other algorithms and applications, respectively. This results in short (time- optimized) operation times of algorithms and applications, respectively, thus making cores available as early as possible. On the other hand, algorithms and applications, respectively, repeating with a certain periodicity could be offered less cores to be utilized efficiently due to the algorithm longevity. In case of no more available cores, the operating system has to context switch the state of all cores used by an algorithm and let another algorithm use this pre-defined number of cores.

Cluster of different multi-core machines

Time-optimized and core-bound algorithm scheduling solutions are used by considering a given number of cores and a valid task-to-processor core assignment. The algorithm can be scheduled programmatically in an explicit way according to its time-optimized and machine-and-core-bound computed scheduling solution.

Claims

Patent Claims

1. A method for scheduling a predictable operation of an algorithm (ALG) on a multi-core processor comprising a plural- ity of parallel working cores (Pl,..., P8), comprising the steps of creating a model of the algorithm (ALG) , thereby identifying tasks (tl,...,t22) of the algorithm (ALG) and at least one characteristic of each of the tasks (tl, ..., t22) ; assigning each identified task (tl,...,t22) according to its at least one characteristic to at least some of the plurality of cores (Pl,..., P8) of the multi-core processor; and - determining a starting point in time for each of the tasks (tl,...,t22) of the algorithm (ALG).

2. Method according to claim 1, wherein the at least one characteristic of each of the tasks (tl,...,t22) comprises one or more of the following parameters: dependencies (dl,...,d32) between the tasks (tl, ...,t22) , duration (delta) of a task (tl, ... , t22) , data exchange between the tasks (tl, ... , t22) .

3. Method according to claim 1 or 2, wherein the stop of creating a model of the algorithm (ALG) comprises identifying properties of the algorithm.

4. Method according to claim 3, wherein the properties of the algorithm (ALG) to be considered comprise one or more of the following parameters: preferred or needed cores (Pl,..., P8) for a specific task (tl, ... , t22) , - a group of tasks (tl,...,t22) which has to be executed by one specific core (Pl,..., P8).

5. Method according to one of the preceding claims, wherein the number of cores (Pl,..., P8) to be used for executing the algorithm (ALG) is predetermined according to an overall time-optimization of the algorithm (ALG) .

6. Method according to one of the preceding claims, wherein the number of cores (Pl,..., P8) to be used for executing the algorithm (ALG) is predetermined according to an optimized core utilization of at least one of the cores (Pl,..., P8) of the multi-core processor.

7. Method according to one of the preceding claims, wherein the step of creating the model of the algorithm (ALG) is carried out based on a computer.

8. Method according to one of the preceding claims, wherein the step of assigning the identified tasks (tl,...,t22) to at least some of the amount of cores (Pl,..., P8) of the multi- core processor is carried out by an optimization algorithm (ALG) or by a heuristics.

9. Method according to one of the preceding claims, wherein the step of assigning the identified tasks (tl,...,t22) to at least some of the amount of cores of the multi-core processor is carried out dependent on the properties of an operating system controlling the multi-core processor.

10. Method according to one of the preceding claims, wherein the steps of assigning the identified tasks (tl,...,t22) to at least some of the amount of cores (Pl,..., P8) of the multi-core processor and of determining the critical date for each of the tasks (tl,...,t22) of the algorithm (ALG) is carried out during operation of the multi-core processor to provide adaptive dynamic scheduling of the algorithm (ALG) .

11. Computer program product directly loadable into the internal memory of a digital computer, comprising software code portions for performing the steps of one of the preceding claims when said product is run on a computer.

12. A device for creating a schedule for a predictable opera- tion of an algorithm (ALG) on a multi-core processor comprising an plurality of parallel working cores (Pl,..., P8), comprising: a first means for creating a model of the algorithm (ALG), thereby identifying tasks (tl,...,t22) of the al- gorithm (ALG) and at least one characteristic of each of the tasks (tl, ... , t22) ; a second means for assigning each identified task (tl,...,t22) according to its at least one characteristic to at least some of the plurality of cores (Pl,..., P8) of the multi-core processor; and a third means for determining a starting point in time for each of the tasks (tl,...,t22) of the algorithm (ALG) .

13. Device according to claim 12, which comprises further means for executing the method steps according to claims 2 to 10.