1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378
|
schedtool
Copyright (C) 2002-2006 Freek
Release under GPL, version 2 (see LICENSE)
Use at your own risk.
Inspired by setbatch (C) 2002 Ingo Molnar
Suggestions are welcome.
CONTENT:
--------
-About
-Usage / description of schedtool
-A complex example
-Static Priority
-Policies reviewed
-Thanks
-Appendix A: A course into Multi-Level-Feedback-Queue-scheduling
ABOUT:
------
schedtool was born, because there was no tool to change or query
all CPU-scheduling policies under Linux, in one handy command.
Support for CPU-affinity has also been added and most recently
(re-)nicing of processes.
Thus, schedtool is the definitive interface to Linux's scheduler.
It can be used to avoid skipping for A/V-applications, to lock
processes onto certain CPUs on SMP/NUMA systems, which may be
beneficial for networking or benchmarks, or to adjust nice-levels
of lesser important jobs to maintain a high amount of interactive
responsiveness under high load.
All output, even errors, go to STDOUT to ease piping.
If you don't know about scheduling policies, you probably don't want to
use this program - or learn and read "man sched_setscheduler".
Certain modes (as of this writing: SCHED_IDLEPRIO and SCHED_ISO) need a
patched kernel. See INSTALL for details.
USAGE:
------
There are 3 operation modes: query, set and execute new process.
QUERY PROCESS(ES):
#> schedtool <LIST_OF_PIDs>
This will print all information it can obtain for the processes with
<PIDs>.
SET PROCESS(ES):
I) scheduling policy (detailed discussion in "POLICY OVERVIEW")
#> schedtool -<MODE> <LIST_OF_PIDs>
where <MODE> is one of:
N or 0: for SCHED_NORMAL
F or 1: for SCHED_FIFO
R or 2: for SCHED_RR
B or 3: for SCHED_BATCH
I or 4: for SCHED_ISO
example:
#> schedtool -B <PIDs>
II) static priority
.. is mandatory for SCHED_FIFO and SCHED_RR
STATIC_PRIO is a number from 1-99; higher values mean higher priority in
that scheduling class (relative to other processes in the same class).
example:
#> schedtool -R -p 20 <PIDs>
III) CPU-affinity
example:
#> schedtool -a 0x3 <PIDs>
IV) nice-level
example
#> schedtool -n 10 <PIDs>
Of course you can combine policy with affinity and nice in one call.
EXECUTE A NEW PROCESS:
example:
#> schedtool [SCHED_PARAMETERS_LIKE_ABOVE] -e command -arg1 -arg2 file
This will execute "command -arg1 -arg2 file" like typing exactly this
on the prompt would.
CPU-affinity:
To give PIDs/a command a certain CPU-affinity, use the -a switch.
The value is used as a simple bitmask, the bit set to 1 denoting the
PID may run on that CPU, the bit unset (0) denoting it MUST NOT.
The following picture uses only 16 bits for example purpose.
The resulting value is the bitwise OR of the single values for each CPU.
CPU0 (the first CPU in your system) is denoted by the least significant bit
(here, the one on the right side).
CPU 0-----------,
CPU 1-----------,|
... ||
mask VV means value == dec
--------------------------------------------------------------------------
0000 0000 0000 0001 -> run only on CPU0 -> 0x1 == 1
0000 0000 0000 1001 -> run on CPU0 AND CPU3 -> 0x9 == 9
0000 0000 0000 1111 -> run on CPU0-CPU4 -> 0xF == 15
1111 1111 1111 1111 -> run on CPU0-CPU15 -> 0xFFFF ==2^16-1
To set back to the default (PID may run on all CPUs), use the mask
0xFFFFFFFF (the kernel will automatically reduce it to the max # of cpus)
As a short mnemonic rule, each 'F' denotes a set of 4 CPUs
(0xF: all 4 CPUs, 0xFF: all 8 CPUs, and so on ...)
Since version 1.1.1 a new list mode is supported, allowing you to
specify the target-CPUs without doing bitjuggling. To separate the
different CPUs, use a ',':
Run on CPU0 and CPU1:
#> schedtool -a 0,1 <PIDs>
A COMPLEX EXAMPLE:
------------------
#> schedtool -R -p 50 -a 0x2 -e mplayer file.avi
Execute mplayer file.avi with
-SCHED_RR,
-static priority 50,
-affinity 0x2 (run only on CPU1).
ABOUT STATIC PRIORITY:
----------------------
Static priority is something completely different than the nice-level; the
nice-level is added to the dynamic priority, and the higher it gets, the more
the process is "punished"([2]), whereas the static priority is used to find
the next process to run in the current scheduling class and the higher it is
the more preferred >in general< the process is over others, e.g. when
it's becoming ready after a blocking action. It will/may also preempt
another, lower-prioritized process.
STATIC_PRIO can't be assigned to SCHED_NORMAL or SCHED_BATCH. The
code won't prevent this (a warning is printed - think UNIX), you maybe get an
error later at the setting-call.
v1.2.4+ support a probe mode like sched-utils; it will display each policy's
min and max priority, when given the -r parameter.
#> schedtool -r
N: SCHED_NORMAL : prio_min 0, prio_max 0
F: SCHED_FIFO : prio_min 1, prio_max 99
R: SCHED_RR : prio_min 1, prio_max 99
B: SCHED_BATCH : prio_min 0, prio_max 0
I: SCHED_ISO : policy not implemented
D: SCHED_IDLEPRIO: policy not implemented
POLICY OVERVIEW + WHERE TO USE:
-------------------------------
SCHED_NORMAL
is the standard scheduling policy and good for the average
job with reasonable interaction.
SCHED_FF and SCHED_RR
are for real-time constraints.
Don't use them for normal stuff, because they've got extremely short
time-slices increasing the context-switching overhead and they won't
let other processes run until they get blocked by a system-call like
read() or actively free themselves from the CPU via the system-call
sched_yield(2).
SCHED_BATCH
is encuraged for long-running and non-interactive
processes; the timeslice is considerably longer (1.5s I think) -
these processes, though, are interrupted almost anytime by other ones to
guarantee interactiveness.
Processes won't get any interactive boosts.
Users are encouraged to set their computing jobs to SCHED_BATCH. Or, as
admin of a compute-server, you could set their shells to SCHED_BATCH
via the login-script.
SCHED_BATCH has been included in 2.6.16+ kernels.
SCHED_ISO [patch needed, see INSTALL]
is a new mode, currently only in Con's patches, to mimick the
real-time class for non-root users. To quote Con:
"Any task trying to start as real time that doesn't have authority to do so
will be set to SCHED_ISO. This is a non-expiring scheduler policy designed to
guarantee a timeslice within a reasonable latency while preventing starvation.
Good for gaming, video at the limits of hardware, video capture etc.
It is best set using the schedtool by a normal user trying to start something
as SCHED_RR." [ http://kerneltrap.org/node/view/2159 ]
SCHED_ISO is now somewhat deprecated; SCHED_RR is now possible for normal users,
albeit to a limited amount only. See newer kernels.
SCHED_IDLEPRIO [patch needed, see INSTALL]
SCHED_IDLEPRIO was formerly called SCHED_BATCH in the -ck patchset; the
-ck SCHED_BATCH has nothing to do with the mainline SCHED_BATCH!
It is a policy where the process does not get any interactive boost
(through sleeping etc) and also only the idle CPU time.
For more information you can read the file SCHED_DESIGN as a good overview, but
be warned, that *some* things may be outdated by the new O(1)-patches.
Then proceed to the man-page for sched_setscheduler(2) - it gives a very good
overview and is _highly_ recommended.
FINAL WORDS / CONTACT:
----------------------
If you feel you are able to make this software better or you can report
some numbers with the different scheduling policies, please contact me.
Feedback is appreciated.
Please use freshmeat.net's "contact author"-feature to do so.
THANKS:
-------
Thanks fly out to (in no particular order)
o Ingo Molnar
o Con Kolivas for suggesting the -e switch, submitting patch for SCHED_ISO
o Samuli Krkkinen, the quality-verification-engineer
o my girlfriend and supporting friends
- -- - -- -
[2]:
A bit simplified - it's not all that easy :-) Go on to Appendix A for
an example on how scheduling is performed in Solaris.
[3]: (see also [4])
Nice level and dynamic priority are somewhat "strange": sometimes, higher
values mean higher priority (to be put on CPU when process ready); sometimes,
lower values mean higher priority.
At the moment, I confirm my system being the following:
- The nice-level is SUBSTRACTED from the dynamic priority, thus giving a
Process nice -10 means INCREASING it's priority (valuewise) by 10 points.
- In the end, higher values mean higher priority.
Use "ps -eO pri,nice" and look for yourself.
APPENDIX A: INTRODUCTION TO MULTI-LEVEL-QUEUE-FEEDBACK-SCHEDULING
-----------------------------------------------------------------
This appendix uses information originating from the "System
Programming I"-course at my university; the examples are using Solaris
2.X, but I think, Linux is doing it in a >similar< (albeit not that
overcomplicated) way.
Solaris has 60 wait queues for the class TS (TimeSharing); there are
other classes as system and RealTime as well.
A TS-queue looks like this (which is basically a set of rules):
Level ts_quantum ts_tqexp ts_maxwait ts_lwait ts_slpret
0 200 0 0 50 50
. . . . . .
. . . . . .
. . . . . .
44 40 34 0 55 55
45 40 35 0 56 56
. . . . . .
. . . . . .
. . . . . .
59 20 49 32000 59 59
You can display all these numbers on a Solaris-box using
# dispadmin -c TS -g
Level:
just a queue ID.
ts_quantum:
the maximum timeslice - the maximum time, the process
is allowed to run continuously until it's interrupted and another
process is physically put on the CPU.
ts_tqexp:
if the process uses it's timeslice entirely, it's put into that
queue [cf. Level].
ts_maxwait:
maximum time to wait for the process in that queue without being
run, in seconds.
ts_lwait:
if one process stays too long in the current queue, it's put
into that queue.
ts_slpret:
queue to put the process in after it was blocked, e.g. in a
syscall.
Let's start an imaginary process and look what's happening:
Start ->
-> queue 59, ts_quantum 20ms -> queue 49, ts_quantum 40ms
-> queue 39, ts_quantum 80ms -> queue 29, ts_quantum 120ms
-> queue 19, ts_quantum 160ms -> queue 9, ts_quantum 200ms
You see how this only number-crunching process is put into queues that
allow him to use the CPU for more and more time.
Now let's do the process a blocking action:
queue 0, ts_quantum 200ms, after e.g. 100ms blocking call! -->
queue 50, ts_quantum 40ms, after e.g. 20ms blocking call! -->
queue 58, ts_quantum 40ms
Now you see how this process is "punished", or from another point of
view, the scheduler thinks, this is an interactive process computing a
bit and then outputting data, so it's put into a queue that has
averagely the same computing time until a this output occurs.
So the schedulers knows pretty much about the current state of the
machine and can plan accordingly.
The dynamic priority is something like an age - the higher[4] it is the more
likely you get a seat :) (the CPU).
There are 4 rules:
-if a process is not run, it ages - the dynamic priority rises.
-if a process is running, the dynamic priority is lowered.
-the process with the (at the moment) highest priority is put onto the CPU.
-processes with lower priority are/can be interrupted by processes with
higher priority.
This guarantees that no process is running for too long and others are
waiting for too long.
-End Of Documentation
- -- -
[4]:
higher (priority) in means of more important to run in the near future;
higher does not automatically mean a higher value in it's PCB[5]
[5]:
Process Control Block, some structure where important accounting and
other useful information are stored, usually only used by the kernel.
|