[go: up one dir, main page]

WO2008148625A1 - Method and device for scheduling a predictable operation of an algorithm on a multi-core processor - Google Patents

Method and device for scheduling a predictable operation of an algorithm on a multi-core processor Download PDF

Info

Publication number
WO2008148625A1
WO2008148625A1 PCT/EP2008/055910 EP2008055910W WO2008148625A1 WO 2008148625 A1 WO2008148625 A1 WO 2008148625A1 EP 2008055910 W EP2008055910 W EP 2008055910W WO 2008148625 A1 WO2008148625 A1 WO 2008148625A1
Authority
WO
WIPO (PCT)
Prior art keywords
algorithm
tasks
alg
cores
core processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2008/055910
Other languages
French (fr)
Inventor
Andrey Nechypurenko
Egon Wuchner
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Siemens Corp
Original Assignee
Siemens AG
Siemens Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG, Siemens Corp filed Critical Siemens AG
Publication of WO2008148625A1 publication Critical patent/WO2008148625A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs

Definitions

  • the invention relates to a method for scheduling a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores.
  • the invention further relates to a device for creating a schedule for a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores.
  • the invention further relates to a computer program product.
  • an algorithm comprises a plurality of different tasks, each task having a specific duration. Furthermore, there are task dependencies between some of the tasks. These tasks are called interdependent tasks.
  • the operation time of an algorithm depends on many aspects, especially the task dependencies, task durations and a so-called domain-motivated explicit mapping of tasks or task groups to certain cores of the multi-core processor. Improving the operation time of an algorithm implies a better data-throughput or a better task scalability in general. Especially, real-time or life- processing algorithms which are used for medical instruments need feasible run-time predictions with regard to their operation time.
  • a symmetric multi-core proc- essor means that all cores of the multi-core processor have the same capabilities.
  • real-time operating systems like QNX, schedule their execution units, i.e. tasks, threads or processes, in a transparent way according to priorities, task states and queues. Executing an algorithm with a real-time operating system therefore makes it necessary to assign a priority to each task of the algorithm.
  • Several approaches are known how to map task to priorities in order to derive a feasible and predictable algorithm duration. Unfortunately, such a mapping has to be done manually by the programmer lev- eraging his experience and knowledge. Furthermore, this mechanism is hardly applicable to algorithms having interdependent tasks.
  • a further problem concerns task starvation and priority inversion, among others.
  • a method for scheduling a predictable operation of an algorithm on a multi-core processor comprises the steps of creating a model of the algorithm, thereby identifying tasks of the algorithm and at least one characteristic of each of the tasks; assigning each identified task according to its at least one characteristic to at least some of the plurality of cores of the multi-core processor; and determining a starting point in time for each of the tasks of the algorithm.
  • Scheduling a predictable operation means that the overall time of the algorithm, the needed number of cores for executing the algorithm and the resulting core utilization of the number of cores is identical with each execution of the algorithm.
  • the invention is based on the knowledge that the operation time of an algorithm is highly dependent on how tasks of an algorithm are mapped to the cores of a multi-core processor. Creating a model of the algorithm allows identifying paral- lelization options, parallel tasks and the data exchange pattern between these tasks in order to exploit the power of the multi-core processor.
  • the schedule for the algorithm allows a predictable operation regardless if the algorithm is executed on a multi-core processor under control of a general purpose operating system or a real-time operating system. Therefore, the method according to the invention allows providing algorithms which can be used for critical applications, e.g. medical measurements.
  • An advantage over the prior art is that the method according to the invention can be carried out automatically by using a computer system instead of a manual or brute-force approach. Therefore, the method according to the invention may be used for algorithms with changing tasks, changing dependencies and durations because scheduling a new predictable operation of the algorithm can be done very fast and even dynamically.
  • the at least one characteristic of each of the tasks comprises one or more of the following parameters: dependencies between the tasks; duration of a task; data exchange between the tasks.
  • the step for creating a model of the algorithm comprises identifying properties of the algorithm.
  • the properties of the algorithm to be consid- ered comprise one or more of the following parameters: preferred or needed cores for a specific task; a group of tasks which has to be executed by one specific core.
  • the preferred or needed cores for a specific task are called core candidates per task.
  • a group of tasks which has to be executed by one specific core is named a group of tasks to be deployed to one core.
  • the number of cores to be used for executing the algorithm is predetermined according to an overall time-optimization of the algorithm. Additionally or alternatively, the number of cores to be used for executing the algorithm is predetermined according to an optimized core utilization of at least one of the cores of the multi-core processor. Which of these two side conditions will be used for assigning each identified task to the cores of the multi-core processor depends on the amount of cores of the multi-core processor to be used, the number of algorithms which will be executed in parallel on the processor and the operation system for controlling the multi-core processor.
  • the step of creating the model of the algorithm may be carried out based on a computer. Alternatively, it is also pos- sible that the model of the algorithm is created by the programmer.
  • the step of creating the model of the algorithm comprises identifying tasks, their dependencies, data exchanges, task durations, core candidates per task and task groups to be deployed to one core. With this information an optimized algorithm can be exploited to suggest the right scheduling of tasks to available cores.
  • the step of assigning the identified tasks to at least some of the amount of cores of the multi-core processor is carried out by an optimization algorithm or by a heuristics.
  • an optimization algorithm or by a heuristics.
  • the step of assigning the identified tasks to at least some of the amount of cores of the multi-core processor is carried out dependent on the properties of an operating system controlling the multi-core processor.
  • Use of a general purpose operating system or a real-time operating system can be made to provide an optimized schedule of the algorithm.
  • the steps of assigning the identified tasks to at least some of the amount of cores of the multi-core processor and of de- termining the critical date for each of the tasks of the algorithm is carried out during operation of the multi-core processor to provide adaptive dynamic scheduling of the algorithm.
  • providing the schedule of the algo- rithm can be done during executing the algorithm thereby considering possible other algorithms being executed on the multi-core processor.
  • a computer program product directly loadable into the internal memory of a digital computer comprising software code portions for performing the steps of the method according to the invention when said product is run on a computer.
  • a device for creating a schedule for a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores.
  • the device comprises a first means for creating a model of the algorithm, thereby identifying tasks of the algorithm and at least one characteristic of each of the tasks; a second means for assigning each identified task according to its at least one character- istic to at least some of the plurality of cores of the multi-core processor; and a third means for determining a starting point in time for each of the tasks of the algorithm.
  • a device according to the invention has the same advantages as pointed out in connection with the described method.
  • the device according to the invention comprises further means for executing the method steps as set out above.
  • Fig. 1 shows a simplified already analyzed model of an algorithm containing a plurality of tasks
  • Fig. 2 shows a schedule of the tasks of the algorithm of
  • Fig. 1 assigned to a plurality of cores of a multi- core processor.
  • Fig. 1 shows a simplified model of an algorithm ALG consisting of tasks tO to t22.
  • Each of the tasks t ⁇ ,..., t22 has a specific duration which is indicated by reference numeral "delta".
  • Fig. 1 shows a dependency between some of the tasks.
  • the dependency between task tO and task t7 is indicated with dl .
  • task t7 has to wait for some input information from task tO.
  • tasks t6, t8 and tlO need input information from task t7.
  • the dependencies are outlined with d4, d5 and d6. In an appropriate manner further dependencies dl to d32 between further interdependent tasks are indicated.
  • Fig. 1 shows the algorithm after performing the step of cre- ating a model of the algorithm, thereby identifying the tasks tl,..., t22 of the algorithm ALG and identifying at least one characteristic of each of the tasks.
  • the characteristics used for optimization and shown in Fig. 1 are the duration of each of the tasks and dependencies to other tasks. However, addi- tional or alternative characteristics could be used as well.
  • the duration of the algorithm ALG depends on the task dependencies, on the task-to-core deployment and on task-to-priority mapping on the symmetric multi-core proces- sors, if priorities are assigned to each task.
  • a schedule of the algorithm ALG will be determined automatically by optimization algorithms or heuristics.
  • Applying a time-optimization to the algorithm shown in Fig. 1 by using e.g. eight cores of a multi-core processor results in an overall duration of 1850 ms .
  • the resulting task-to-core scheduling is outlined in Fig. 2.
  • each of the tasks tO,..., t22 is assigned to a specific core Pl,..., P8 : Task t22 is assigned to core Pl.
  • Tasks tl, tl9 and t4 are assigned to core P4.
  • Tasks t5, tl8 and til are assigned to core P5.
  • Tasks tO, t6, tl5 and tl7 are assigned to core P6.
  • Tasks t7, t8, t9, tl4 and t2 are assigned to core P7.
  • Tasks tl3 and t21 are assigned to core P2.
  • Tasks tl ⁇ , t20 and t3 are assigned to core P3.
  • Tasks tlO and tl2 are assigned to core P8.
  • the critical path i.e. the path which determines the overall time of the algorithm, comprises tasks tO (executed by core P6) , task tl (executed by core P4), task t22 (executed by core Pl) and task t3 (executed by core P3) .
  • tasks tO, tl, t22 and t3 By adding all of the time durations of these tasks tO, tl, t22 and t3 the overall time of 1850 ms of the algorithm is obtained.
  • Fig. 2 shows which of the tasks have to be executed on which of the cores of the multi-core processor. It is possible that all of the processors execute some of the tasks in parallel. Nevertheless, there may be an interruption, i.e. a pause, be- tween two consecutive tasks running on the same core. This depends on the determination of a starting point in time for each of the tasks of the algorithm.
  • any change to the algorithm structure e.g. any change with regard to task durations or changing of task dependencies or the available number of cores, results in triggering the time-optimization anew to determine the respective core-bound scheduling.
  • Applying optimization techniques to determine an explicit deployment and scheduling of tasks to cores exploits the increasing number of cores of multi-core processors. This mechanism allows adapting the computing power of multiple cores of a processor to the specific needs of a complex algorithm of interdependent tasks.
  • the scheduling solution mechanism proposed allows predicting best or acceptable operation times, the needed number of cores and the resulting core utilization.
  • Time-optimized and core-bound scheduling solutions per algorithm are applicable to any multi-core processors and even scheduling policies of a real-time or even a general-purpose operating system.
  • a scheduling policy of an operating system can vary the number of available cores it is able to provide to one of more parallel executed algorithms.
  • the proposed scheduling solution mechanism either results in better time-optimizations and less core utilization or vice versa, depending on the number of available cores.
  • a general purpose operating system can do dynamic scheduling of many algorithms or applications adaptively and can keep certain deterministic properties like non-starvation or the com- pletion order of running algorithms with respect to their starting times, durations and cores of the multi-core processor .
  • time-optimization techniques allow suggesting the appropriate scheduling solutions in an automated way.
  • Automated scheduling suggestions provide a necessary flexibility in figuring out the right core- scheduling in case of varying algorithm characteristics like task dependencies or single task durations.
  • time-optimized and core-bound scheduling solutions can be exploited even by the scheduling policy of a general purpose operating system in an adaptive way.
  • the scheduling policy can provide a starting algorithm with a maximum number of cores to use. This means, if a multi-core processor comprises for example of twelve cores only eight of them may be scheduled to a specific algorithm to ensure a predictable opera- tion of the algorithm.
  • This mechanism of adaptively allocating a number of cores to a starting algorithm and assigning its task to these reserved cores guarantees non-starvation and deterministic operation time of the algorithm. This applies even in time of task duration times differing from their values used for the time- optimization .
  • Time-optimized and core-bound scheduling policies are applicable to the whole range of multi-core systems either asymmetric, symmetric and clusters of different multi-core processors .
  • An asymmetric multi-core processor consists of one master core (PPE) and for example eight worker cores (SPEs) .
  • PPE master core
  • SPEs worker cores
  • An ex- ample an asymmetric multi-core processor is the so-called IBM cell in which the master core comprises the operating system, whereas the worker cores are not under control of the operating system.
  • Time-optimized and core-bound algorithm schedul- ing solutions are exploited for a given number of cores.
  • the algorithm can be scheduled programmatically in an explicit way according to its time-optimized and SPE-bound computed scheduling solution.
  • time-optimized and core-bound scheduling solutions can be applied in an adaptive way.
  • the suggested scheduling solution provides the best-possible optimized overall operation time with probably limited core utilization.
  • reducing the number of cores results in better core utilization and slower optimized algorithm times.
  • the operating system on an 80-core processor could be more generous with respect to available cores per algorithm in case of fewer running other algorithms and applications, respectively. This results in short (time- optimized) operation times of algorithms and applications, respectively, thus making cores available as early as possible.
  • algorithms and applications, respectively, repeating with a certain periodicity could be offered less cores to be utilized efficiently due to the algorithm longevity.
  • the operating system has to context switch the state of all cores used by an algorithm and let another algorithm use this pre-defined number of cores.
  • Time-optimized and core-bound algorithm scheduling solutions are used by considering a given number of cores and a valid task-to-processor core assignment.
  • the algorithm can be scheduled programmatically in an explicit way according to its time-optimized and machine-and-core-bound computed scheduling solution.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention describes a method for scheduling a predictable operation of an algorithm (ALG) on a multi-core processor comprising a plurality of parallel working cores (P1,..., P8), comprising the steps of - creating a model of the algorithm (ALG), thereby identifying tasks (t1,...,t22) of the algorithm (ALG) and at least one characteristic of each of the tasks (t1,..., t22); assigning each identified task (t1,...,t22) according to its at least one characteristic to at least some of the plurality of cores (P1,..., P8) of the multi-core processor; and determining a starting point in time for each of the tasks (t1,...,t22) of the algorithm (ALG).

Description

Description
Method and device for scheduling a predictable operation of an algorithm on a multi-core processor
The invention relates to a method for scheduling a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores. The invention further relates to a device for creating a schedule for a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores. The invention further relates to a computer program product.
Normally, an algorithm comprises a plurality of different tasks, each task having a specific duration. Furthermore, there are task dependencies between some of the tasks. These tasks are called interdependent tasks. The operation time of an algorithm depends on many aspects, especially the task dependencies, task durations and a so-called domain-motivated explicit mapping of tasks or task groups to certain cores of the multi-core processor. Improving the operation time of an algorithm implies a better data-throughput or a better task scalability in general. Especially, real-time or life- processing algorithms which are used for medical instruments need feasible run-time predictions with regard to their operation time.
Even in case that run-time predictions are not needed the dynamic scheduling of an algorithm with interdependent tasks has to be based on their task dependencies and durations. The tasks of the algorithm have to be executed in sequence of their specified dependencies. In addition, there has to be a certain kind of fairness with respect to all dispatched algorithms processed by the multi-core processor at a certain time. By executing a plurality of algorithms in parallel, it has to be guaranteed that no previous algorithm starves out. On the contrary, an algorithm should be able to be completed before a new or the same algorithm starts. By now, when using so-called symmetric multi-core processors (as provided by INTEL or AMD) task scheduling is under the control of the operating system. A symmetric multi-core proc- essor means that all cores of the multi-core processor have the same capabilities. Even real-time operating systems, like QNX, schedule their execution units, i.e. tasks, threads or processes, in a transparent way according to priorities, task states and queues. Executing an algorithm with a real-time operating system therefore makes it necessary to assign a priority to each task of the algorithm. Several approaches are known how to map task to priorities in order to derive a feasible and predictable algorithm duration. Unfortunately, such a mapping has to be done manually by the programmer lev- eraging his experience and knowledge. Furthermore, this mechanism is hardly applicable to algorithms having interdependent tasks. A further problem concerns task starvation and priority inversion, among others.
Even when using so-called asymmetric multi-core processors or a combination of different multi-core units, like the so- called IBM Cell Broadcasting Engine (CBE) and graphical processing units (GPUs) , the scheduling of the tasks of an algorithm has to be done manually. The scheduling result has to be programmed explicitly. Furthermore, dynamic scheduling of algorithms on a general purpose operation system (e.g. Windows, LINUX) does not provide any guarantee with respect to completion order or completion time.
Especially in case of medical applications consisting of interdependent tasks all of the above mentioned suggestions how to treat algorithms with a plurality of independent tasks are not suitable since a predictable operation of the algorithm is necessary.
It is therefore an object of the present invention to provide a method for scheduling a predictable operation of an algorithm on a multi-core processor. It is a further object of the present invention to provide a device for creating a schedule for a predictable operation of an algorithm on a multi-core processor. Furthermore, a computer program product directly loadable into the internal memory of a digital com- puter has to be provided.
These objects are solved by a method according to claim 1, a computer program product according to claim 11 and a device according to claim 12. Preferred embodiments of the invention are set out in the dependent claims.
A method for scheduling a predictable operation of an algorithm on a multi-core processor, the multi-core processor comprising a plurality of parallel working cores, comprises the steps of creating a model of the algorithm, thereby identifying tasks of the algorithm and at least one characteristic of each of the tasks; assigning each identified task according to its at least one characteristic to at least some of the plurality of cores of the multi-core processor; and determining a starting point in time for each of the tasks of the algorithm.
Scheduling a predictable operation means that the overall time of the algorithm, the needed number of cores for executing the algorithm and the resulting core utilization of the number of cores is identical with each execution of the algorithm.
The invention is based on the knowledge that the operation time of an algorithm is highly dependent on how tasks of an algorithm are mapped to the cores of a multi-core processor. Creating a model of the algorithm allows identifying paral- lelization options, parallel tasks and the data exchange pattern between these tasks in order to exploit the power of the multi-core processor. The schedule for the algorithm allows a predictable operation regardless if the algorithm is executed on a multi-core processor under control of a general purpose operating system or a real-time operating system. Therefore, the method according to the invention allows providing algorithms which can be used for critical applications, e.g. medical measurements. An advantage over the prior art is that the method according to the invention can be carried out automatically by using a computer system instead of a manual or brute-force approach. Therefore, the method according to the invention may be used for algorithms with changing tasks, changing dependencies and durations because scheduling a new predictable operation of the algorithm can be done very fast and even dynamically.
According to a further improvement of the invention, the at least one characteristic of each of the tasks comprises one or more of the following parameters: dependencies between the tasks; duration of a task; data exchange between the tasks. By considering these characteristics of each of the tasks it is possible to provide a schedule for a predictable operation of the algorithm.
According to a further embodiment, the step for creating a model of the algorithm comprises identifying properties of the algorithm. The properties of the algorithm to be consid- ered comprise one or more of the following parameters: preferred or needed cores for a specific task; a group of tasks which has to be executed by one specific core. The preferred or needed cores for a specific task are called core candidates per task. A group of tasks which has to be executed by one specific core is named a group of tasks to be deployed to one core.
According to a further embodiment, the number of cores to be used for executing the algorithm is predetermined according to an overall time-optimization of the algorithm. Additionally or alternatively, the number of cores to be used for executing the algorithm is predetermined according to an optimized core utilization of at least one of the cores of the multi-core processor. Which of these two side conditions will be used for assigning each identified task to the cores of the multi-core processor depends on the amount of cores of the multi-core processor to be used, the number of algorithms which will be executed in parallel on the processor and the operation system for controlling the multi-core processor.
The step of creating the model of the algorithm may be carried out based on a computer. Alternatively, it is also pos- sible that the model of the algorithm is created by the programmer. The step of creating the model of the algorithm comprises identifying tasks, their dependencies, data exchanges, task durations, core candidates per task and task groups to be deployed to one core. With this information an optimized algorithm can be exploited to suggest the right scheduling of tasks to available cores.
The step of assigning the identified tasks to at least some of the amount of cores of the multi-core processor is carried out by an optimization algorithm or by a heuristics. Thereby, use of known optimization algorithms and heuristics is possible, like the optimization methods or heuristics "Simplex", "Simulated Annealing" or "Best-First Searches".
According to a further embodiment the step of assigning the identified tasks to at least some of the amount of cores of the multi-core processor is carried out dependent on the properties of an operating system controlling the multi-core processor. Use of a general purpose operating system or a real-time operating system can be made to provide an optimized schedule of the algorithm.
The steps of assigning the identified tasks to at least some of the amount of cores of the multi-core processor and of de- termining the critical date for each of the tasks of the algorithm is carried out during operation of the multi-core processor to provide adaptive dynamic scheduling of the algorithm. With other words, providing the schedule of the algo- rithm can be done during executing the algorithm thereby considering possible other algorithms being executed on the multi-core processor.
According to a second aspect of the invention a computer program product directly loadable into the internal memory of a digital computer is suggested, the computer program product comprising software code portions for performing the steps of the method according to the invention when said product is run on a computer.
According to a third aspect of the invention a device for creating a schedule for a predictable operation of an algorithm on a multi-core processor comprising a plurality of parallel working cores is suggested. The device comprises a first means for creating a model of the algorithm, thereby identifying tasks of the algorithm and at least one characteristic of each of the tasks; a second means for assigning each identified task according to its at least one character- istic to at least some of the plurality of cores of the multi-core processor; and a third means for determining a starting point in time for each of the tasks of the algorithm.
A device according to the invention has the same advantages as pointed out in connection with the described method.
Furthermore, the device according to the invention comprises further means for executing the method steps as set out above.
The invention will be described in more detail by reference to the figures.
Fig. 1 shows a simplified already analyzed model of an algorithm containing a plurality of tasks, and Fig. 2 shows a schedule of the tasks of the algorithm of
Fig. 1 assigned to a plurality of cores of a multi- core processor.
Fig. 1 shows a simplified model of an algorithm ALG consisting of tasks tO to t22. Each of the tasks tθ,..., t22 has a specific duration which is indicated by reference numeral "delta". For instance, task tO has a duration of delta = 200 ms . Task t7 has a duration of delta = 50 ms . Furthermore, Fig. 1 shows a dependency between some of the tasks. For example the dependency between task tO and task t7 is indicated with dl . This means, task t7 has to wait for some input information from task tO. Accordingly, tasks t6, t8 and tlO need input information from task t7. The dependencies are outlined with d4, d5 and d6. In an appropriate manner further dependencies dl to d32 between further interdependent tasks are indicated.
Fig. 1 shows the algorithm after performing the step of cre- ating a model of the algorithm, thereby identifying the tasks tl,..., t22 of the algorithm ALG and identifying at least one characteristic of each of the tasks. The characteristics used for optimization and shown in Fig. 1 are the duration of each of the tasks and dependencies to other tasks. However, addi- tional or alternative characteristics could be used as well.
In general, the duration of the algorithm ALG depends on the task dependencies, on the task-to-core deployment and on task-to-priority mapping on the symmetric multi-core proces- sors, if priorities are assigned to each task.
To provide a predictable time-optimized and core-bound operation of the algorithm ALG a schedule of the algorithm ALG will be determined automatically by optimization algorithms or heuristics. Applying a time-optimization to the algorithm shown in Fig. 1 by using e.g. eight cores of a multi-core processor results in an overall duration of 1850 ms . The resulting task-to-core scheduling is outlined in Fig. 2. As can be seen from Fig. 2 each of the tasks tO,..., t22 is assigned to a specific core Pl,..., P8 : Task t22 is assigned to core Pl. Tasks tl, tl9 and t4 are assigned to core P4. Tasks t5, tl8 and til are assigned to core P5. Tasks tO, t6, tl5 and tl7 are assigned to core P6. Tasks t7, t8, t9, tl4 and t2 are assigned to core P7. Tasks tl3 and t21 are assigned to core P2. Tasks tlβ, t20 and t3 are assigned to core P3. Tasks tlO and tl2 are assigned to core P8.
Furthermore, the dependencies dl,..., d32 which are outlined in Fig. 1 are shown in Fig. 2, too. As can be seen from Fig. 2, the critical path, i.e. the path which determines the overall time of the algorithm, comprises tasks tO (executed by core P6) , task tl (executed by core P4), task t22 (executed by core Pl) and task t3 (executed by core P3) . By adding all of the time durations of these tasks tO, tl, t22 and t3 the overall time of 1850 ms of the algorithm is obtained. Since task t3 is dependent from task t22, task t22 is dependent from task tl and task tl is dependent from task tO a reduc- tion of time compared to the overall time of 1850 ms is not possible. On the other hand, if the execution of these four tasks tO, tl, t22 and t3 is not optimized a prolongation in the overall time of execution for the algorithm ALG will be the result.
Fig. 2 shows which of the tasks have to be executed on which of the cores of the multi-core processor. It is possible that all of the processors execute some of the tasks in parallel. Nevertheless, there may be an interruption, i.e. a pause, be- tween two consecutive tasks running on the same core. This depends on the determination of a starting point in time for each of the tasks of the algorithm.
A person skilled in the art will recognize that any change to the algorithm structure, e.g. any change with regard to task durations or changing of task dependencies or the available number of cores, results in triggering the time-optimization anew to determine the respective core-bound scheduling. Applying optimization techniques to determine an explicit deployment and scheduling of tasks to cores exploits the increasing number of cores of multi-core processors. This mechanism allows adapting the computing power of multiple cores of a processor to the specific needs of a complex algorithm of interdependent tasks. The scheduling solution mechanism proposed allows predicting best or acceptable operation times, the needed number of cores and the resulting core utilization.
Time-optimized and core-bound scheduling solutions per algorithm are applicable to any multi-core processors and even scheduling policies of a real-time or even a general-purpose operating system. Depending on the current system and core load, respectively, a scheduling policy of an operating system can vary the number of available cores it is able to provide to one of more parallel executed algorithms. The proposed scheduling solution mechanism either results in better time-optimizations and less core utilization or vice versa, depending on the number of available cores. This way a general purpose operating system can do dynamic scheduling of many algorithms or applications adaptively and can keep certain deterministic properties like non-starvation or the com- pletion order of running algorithms with respect to their starting times, durations and cores of the multi-core processor .
By modelling and specifying, respectively, the structure of an algorithm in terms of its tasks, task dependencies, task durations, core candidates of each tasks and one-core task groups (as pointed out in Fig. 1) time-optimization techniques allow suggesting the appropriate scheduling solutions in an automated way. Automated scheduling suggestions provide a necessary flexibility in figuring out the right core- scheduling in case of varying algorithm characteristics like task dependencies or single task durations. In addition to a real-time operating system time-optimized and core-bound scheduling solutions can be exploited even by the scheduling policy of a general purpose operating system in an adaptive way. Depending on the current system load and the remaining number of available cores, the scheduling policy can provide a starting algorithm with a maximum number of cores to use. This means, if a multi-core processor comprises for example of twelve cores only eight of them may be scheduled to a specific algorithm to ensure a predictable opera- tion of the algorithm.
The more cores an algorithm is able to use, the better overall time-optimization is possible. But the core utilization is certainly not on maximum level. In contrast, the fewer cores the algorithm is allowed to use, the better core utilization is possible. However, there is the drawback of longer overall operation times despite a time-optimization.
This mechanism of adaptively allocating a number of cores to a starting algorithm and assigning its task to these reserved cores guarantees non-starvation and deterministic operation time of the algorithm. This applies even in time of task duration times differing from their values used for the time- optimization .
Time-optimized and core-bound scheduling policies are applicable to the whole range of multi-core systems either asymmetric, symmetric and clusters of different multi-core processors .
Asymmetric multi-core processor
An asymmetric multi-core processor consists of one master core (PPE) and for example eight worker cores (SPEs) . An ex- ample an asymmetric multi-core processor is the so-called IBM cell in which the master core comprises the operating system, whereas the worker cores are not under control of the operating system. Time-optimized and core-bound algorithm schedul- ing solutions are exploited for a given number of cores. The algorithm can be scheduled programmatically in an explicit way according to its time-optimized and SPE-bound computed scheduling solution.
Symmetric multi-core processors
By using a real-time operating system it is possible to leverage the predictability property of time-based, i.e. time- optimized, and core-bound scheduling.
On a general purpose operating system, such as Windows or LINUX, time-optimized and core-bound scheduling solutions can be applied in an adaptive way. By assuming a high number of available cores, the suggested scheduling solution provides the best-possible optimized overall operation time with probably limited core utilization. On the other hand, reducing the number of cores results in better core utilization and slower optimized algorithm times.
For instance, the operating system on an 80-core processor could be more generous with respect to available cores per algorithm in case of fewer running other algorithms and applications, respectively. This results in short (time- optimized) operation times of algorithms and applications, respectively, thus making cores available as early as possible. On the other hand, algorithms and applications, respectively, repeating with a certain periodicity could be offered less cores to be utilized efficiently due to the algorithm longevity. In case of no more available cores, the operating system has to context switch the state of all cores used by an algorithm and let another algorithm use this pre-defined number of cores.
Cluster of different multi-core machines
Time-optimized and core-bound algorithm scheduling solutions are used by considering a given number of cores and a valid task-to-processor core assignment. The algorithm can be scheduled programmatically in an explicit way according to its time-optimized and machine-and-core-bound computed scheduling solution.

Claims

Patent Claims
1. A method for scheduling a predictable operation of an algorithm (ALG) on a multi-core processor comprising a plural- ity of parallel working cores (Pl,..., P8), comprising the steps of creating a model of the algorithm (ALG) , thereby identifying tasks (tl,...,t22) of the algorithm (ALG) and at least one characteristic of each of the tasks (tl, ..., t22) ; assigning each identified task (tl,...,t22) according to its at least one characteristic to at least some of the plurality of cores (Pl,..., P8) of the multi-core processor; and - determining a starting point in time for each of the tasks (tl,...,t22) of the algorithm (ALG).
2. Method according to claim 1, wherein the at least one characteristic of each of the tasks (tl,...,t22) comprises one or more of the following parameters: dependencies (dl,...,d32) between the tasks (tl, ...,t22) , duration (delta) of a task (tl, ... , t22) , data exchange between the tasks (tl, ... , t22) .
3. Method according to claim 1 or 2, wherein the stop of creating a model of the algorithm (ALG) comprises identifying properties of the algorithm.
4. Method according to claim 3, wherein the properties of the algorithm (ALG) to be considered comprise one or more of the following parameters: preferred or needed cores (Pl,..., P8) for a specific task (tl, ... , t22) , - a group of tasks (tl,...,t22) which has to be executed by one specific core (Pl,..., P8).
5. Method according to one of the preceding claims, wherein the number of cores (Pl,..., P8) to be used for executing the algorithm (ALG) is predetermined according to an overall time-optimization of the algorithm (ALG) .
6. Method according to one of the preceding claims, wherein the number of cores (Pl,..., P8) to be used for executing the algorithm (ALG) is predetermined according to an optimized core utilization of at least one of the cores (Pl,..., P8) of the multi-core processor.
7. Method according to one of the preceding claims, wherein the step of creating the model of the algorithm (ALG) is carried out based on a computer.
8. Method according to one of the preceding claims, wherein the step of assigning the identified tasks (tl,...,t22) to at least some of the amount of cores (Pl,..., P8) of the multi- core processor is carried out by an optimization algorithm (ALG) or by a heuristics.
9. Method according to one of the preceding claims, wherein the step of assigning the identified tasks (tl,...,t22) to at least some of the amount of cores of the multi-core processor is carried out dependent on the properties of an operating system controlling the multi-core processor.
10. Method according to one of the preceding claims, wherein the steps of assigning the identified tasks (tl,...,t22) to at least some of the amount of cores (Pl,..., P8) of the multi-core processor and of determining the critical date for each of the tasks (tl,...,t22) of the algorithm (ALG) is carried out during operation of the multi-core processor to provide adaptive dynamic scheduling of the algorithm (ALG) .
11. Computer program product directly loadable into the internal memory of a digital computer, comprising software code portions for performing the steps of one of the preceding claims when said product is run on a computer.
12. A device for creating a schedule for a predictable opera- tion of an algorithm (ALG) on a multi-core processor comprising an plurality of parallel working cores (Pl,..., P8), comprising: a first means for creating a model of the algorithm (ALG), thereby identifying tasks (tl,...,t22) of the al- gorithm (ALG) and at least one characteristic of each of the tasks (tl, ... , t22) ; a second means for assigning each identified task (tl,...,t22) according to its at least one characteristic to at least some of the plurality of cores (Pl,..., P8) of the multi-core processor; and a third means for determining a starting point in time for each of the tasks (tl,...,t22) of the algorithm (ALG) .
13. Device according to claim 12, which comprises further means for executing the method steps according to claims 2 to 10.
PCT/EP2008/055910 2007-06-05 2008-05-14 Method and device for scheduling a predictable operation of an algorithm on a multi-core processor Ceased WO2008148625A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP07011068 2007-06-05
EP07011068.9 2007-06-05

Publications (1)

Publication Number Publication Date
WO2008148625A1 true WO2008148625A1 (en) 2008-12-11

Family

ID=39734151

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2008/055910 Ceased WO2008148625A1 (en) 2007-06-05 2008-05-14 Method and device for scheduling a predictable operation of an algorithm on a multi-core processor

Country Status (1)

Country Link
WO (1) WO2008148625A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2505273A (en) * 2012-08-21 2014-02-26 Lenovo Singapore Pte Ltd Task scheduling in a multi-core processor with different size cores, by referring to a core signature of the task.
EP2612235A4 (en) * 2010-09-03 2014-03-05 Siemens Ag METHOD OF PARRALLELIZING AUTOMATIC CONTROL PROGRAMS AND COMPILER
CN108351815A (en) * 2015-11-12 2018-07-31 西门子股份公司 Method for running multi-core processor
CN117215801A (en) * 2023-11-07 2023-12-12 北京数渡信息科技有限公司 On-chip load performance optimizing device suitable for multi-core processor

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
BABAOGLU O ET AL: "Mapping parallel computations onto distributed systems in Paralex", COMPEURO '91. ADVANCED COMPUTER TECHNOLOGY, RELIABLE SYSTEMS AND APPLI CATIONS. 5TH ANNUAL EUROPEAN COMPUTER CONFERENCE. PROCEEDINGS. BOLOGNA, ITALY 13-16 MAY 1991, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 13 May 1991 (1991-05-13), pages 123 - 130, XP010022845, ISBN: 978-0-8186-2141-3 *
COLES J: "A User Interface for DAG Scheduling Algorithms", MANCHESTER UNIVERSITY, vol. -, no. -, 2 May 2007 (2007-05-02), pages 1 - 31, XP002495491 *
JUAN R PIMENTEL ED - JUAN R PIMENTEL: "An Incremental Approach to Task and Message Scheduling for AUTOSAR Based Distributed Automotive Applications", SOFTWARE ENGINEERING FOR AUTOMOTIVE SYSTEMS, 2007. ICSE WORKSHOPS SEAS '07. FOURTH INTERNATIONAL WORKSHOP ON, IEEE, PI, 1 May 2007 (2007-05-01), pages 1 - 1, XP031175845, ISBN: 978-0-7695-2968-4 *
RONNGREN S ET AL: "Static multiprocessor scheduling of periodic real-time tasks with precedence constraints and communication costs", SYSTEM SCIENCES, 1995. VOL. III,. PROCEEDINGS OF THE TWENTY-EIGHTH HAW AII INTERNATIONAL CONFERENCE ON WAILEA, HI, USA 3-6 JAN. 1995, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, 3 January 1995 (1995-01-03), pages 143 - 152, XP010128247, ISBN: 978-0-8186-6935-4 *
YU-KWONG KWOK ET AL: "Static scheduling algorithms for allocating directed task graphs to multiprocessors", ACM COMPUTING SURVEYS, ACM, NEW YORK, NY, US, US, vol. 31, no. 4, 1 December 1999 (1999-12-01), pages 406 - 471, XP002461554, ISSN: 0360-0300 *
ZHENG W: "Workflow Scheduling Simulation Demo (Screenshots)", MANCHESTER UNIVERSITY, vol. -, no. -, September 2006 (2006-09-01), pages 1 - 12, XP002495492, Retrieved from the Internet <URL:http://www.cs.man.ac.uk/~zhengw/demo/demo.html> *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2612235A4 (en) * 2010-09-03 2014-03-05 Siemens Ag METHOD OF PARRALLELIZING AUTOMATIC CONTROL PROGRAMS AND COMPILER
GB2505273A (en) * 2012-08-21 2014-02-26 Lenovo Singapore Pte Ltd Task scheduling in a multi-core processor with different size cores, by referring to a core signature of the task.
GB2505273B (en) * 2012-08-21 2015-01-07 Lenovo Singapore Pte Ltd Task scheduling in big and little cores
US9619282B2 (en) 2012-08-21 2017-04-11 Lenovo (Singapore) Pte. Ltd. Task scheduling in big and little cores
CN108351815A (en) * 2015-11-12 2018-07-31 西门子股份公司 Method for running multi-core processor
CN117215801A (en) * 2023-11-07 2023-12-12 北京数渡信息科技有限公司 On-chip load performance optimizing device suitable for multi-core processor
CN117215801B (en) * 2023-11-07 2024-01-23 北京数渡信息科技有限公司 On-chip load performance optimizing device suitable for multi-core processor

Similar Documents

Publication Publication Date Title
EP2176751B1 (en) Scheduling by growing and shrinking resource allocation
US9436510B2 (en) System and method for managing the interleaved execution of threads
CN105808328B (en) The methods, devices and systems of task schedule
US8875146B2 (en) Systems and methods for bounding processing times on multiple processing units
US9262220B2 (en) Scheduling workloads and making provision decisions of computer resources in a computing environment
Massa et al. Outstanding paper: Optimal and adaptive multiprocessor real-time scheduling: The quasi-partitioning approach
CN106462452B (en) vehicle control device
Qiu et al. Cost-minimizing preemptive scheduling of mapreduce workloads on hybrid clouds
KR20230064963A (en) Method and apparatus for resource allocation in cluster computing system
WO2008148625A1 (en) Method and device for scheduling a predictable operation of an algorithm on a multi-core processor
Mody et al. Smart round robin CPU scheduling algorithm for operating systems
KR102022972B1 (en) Runtime management apparatus for heterogeneous multi-processing system and method thereof
CN117311899A (en) Task processing, task scheduling method, computing device and computer storage medium
WO2008148624A1 (en) Method and device for providing a schedule for a predictable operation of an algorithm on a multi-core processor
Groesbrink et al. A criticality-aware mapping of real-time virtual machines to multi-core processors
JP6156379B2 (en) Scheduling apparatus and scheduling method
CN118981360A (en) Task scheduling method, device, storage medium, system and program product
JP2021060923A (en) Vehicle control device
CN112506640B (en) Multiprocessor architecture for encryption operation chip and allocation method
CN117806789A (en) Task processing method and device of multi-core operating system and computing device
Mishra et al. Dynamic task scheduling on multicore automotive ECUs
WO2025065274A1 (en) Dynamic time partitioning for computation-communication chains
Oberholzer Scheduling for MIG-capable GPUs: Accelerator-aware operating system scheduling
CN120407127B (en) Task scheduling method, scheduling device, electronic device, storage medium and product
Liu Efficient design, analysis, and implementation of complex multiprocessor real-time systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08759591

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08759591

Country of ref document: EP

Kind code of ref document: A1