1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658
|
HOW-TO use rt-app and workgen to emulate a workload ?
**** rt-app and workgen ****
rt-app/workgen is a tool that can be used to emulate a use case. Not only the
sleep and run pattern can be emulated but also the dependency between tasks
like accessing same critical resources, creating sequencial wake up or syncing
the wake up of threads. The use case is described in a json like file which is
first parsed by workgen then rt-app.
workgen is a python script that will parse, check and update the json like file
in order to strictly match the json grammar before calling rt-app that will
execute the sequence described in it.
Even if the json protocol is used in rt-app, workgen enables some freedom
compared to stricter json grammar rules. For examples,
- You don't have to ensure uniqueness of key in json object, workgen will check
the file and ensure that each key is unique by appending an index if necessary.
The feature is quite useful when you describe a sequence made of same kind of
events e.g. a sequence of multiple sleep and run events.
- Some values can be omitted and workgen will add them when it parses the file
before starting the use case.
**** Compiling and cross-compiling rt-app ****
rt-app uses libjson-c to parse the .json file
You can either install the libjson-c2 and libjson-c-dev deb packages or compile
from source code available here: https://github.com/json-c/json-c
The following sequence can be used to cross compile rt-app with static linkage
for aarch64:
For libjson-c:
export ac_cv_func_malloc_0_nonnull=yes
export ac_cv_func_realloc_0_nonnull=yes
./autogen.sh
./configure --host=aarch64-linux-gnu --disable-shared --enable-static
make
For rt-app:
export ac_cv_lib_json_c_json_object_from_file=yes
./autogen.sh
./configure --host=aarch64-linux-gnu LDFLAGS=" --static -L<path to parent of json repo>/json-c/." CFLAGS="-I<path to parent of json repo>" --with-deadline
make
e.g, with a directory structure like the following:
$ tree -d
.
├── autom4te.cache
├── build
├── build-aux
├── doc
│ └── examples
│ ├── cpufreq_governor_efficiency
│ ├── merge
│ └── tutorial
├── json-c
│ ├── autom4te.cache
│ ├── doc
│ │ └── html
│ └── tests
├── libdl
├── m4
└── src
the configure step becomes
./configure --host=aarch64-linux-gnu LDFLAGS="--static -L./../json-c/." CFLAGS="-I./.." --with-deadline
**** json file skeleton ****
The json file that describes a workload is made on 3 main objects: tasks,
resources and global objects. Only tasks object is mandatory; default value will
be used if global object is not defined and resources object is kept for
backward compatibility with old version of rt-app but it doesn't give any
benefit to declare it as the resources are now dynamically created by rt-app
when it parses the file.
**** global object ****
The global object defines parameters for the whole use case:
* duration : Integer. Duration of the use case in seconds. All the threads will
be killed once the duration has elapsed. if -1 has been set, the use case will
run indefinitly until all threads kill themselves (as an example if a finite
number of loop has been defined in their running pattern) or if a signal
is received to stop the use case.
* calibration : String or Integer: A String defines the CPU that will be used
to calibrate the ns per loop value. "CPU0" is the default value (see run event
section for details about ns per loop value). A integer skips the calibration
step and uses the integer value as ns per loop.
* default_policy : String. Default scheduling policy of threads. Default
value is SCHED_OTHER.
* pi_enabled: Boolean. Enable the prority inheritance of mutex. Default value
is False.
* lock_pages : Boolean. Lock the mem page in RAM. Locking the page in RAM
ensures that your RT thread will not be stalled until a page is moved from swap
to RAM. The lock of the page is only possible for non CFS tasks. Default value
is False.
* logdir : String. Path to store the various log files. The default path is
the current directory (./).
* log_basename : String. Prefix used for all log files of the use case.
"rt-app-" is used by default.
* log_size : String or Integer. A Integer defines a fix size in MB of the
temporary buffer (size per thread) that will be used to store the log data
before saving them in a file. This temporary buffer is used as a cicular
buffer so the oldest data will be lost in case of overflow. A string is used
to set a predifined behavior:
- "file" will be used to store the log data directly in the file without
using a temporary buffer.
- "Disable" will disable the log mecahnism.
- "Auto" will let rt-app compute the buffer size to not overflow the latter
during the use case.
The use of a temporary buffer prevents the threads of unexpected wait during
io access. The "Auto" mode is not implemented yet and fallback to "file" mode
for the moment.
* ftrace: Boolean. If enable, rt-app logs in ftrace the main events of the use
case. Default value is False.
* gnuplot : Boolean. if True, it will create a gnu plot compatible file for
each threads (see gnuplot section for more details). Default value is False.
"io_device" : Text. Path to the file which will be written to create IO-bounded
busy loop. Specify it carefully since it might damage the specified file.
Default value is "/dev/null".
"mem_buffer_size" : Integer. The size of per-thread memory buffer in byte being
used to create IO-bounded and memory-bounded busy loop. Default value is
4194304(4MB).
* cumulative_slack : Boolean. Accumulate slack (see below) measured during
successive timer events in a phase. Default value is False (time between the
end of last event and the end of the phase).
*** default global object:
"global" : {
"duration" : -1,
"calibration" : "CPU0",
"default_policy" : "SCHED_OTHER",
"pi_enabled" : false,
"lock_pages" : false,
"logdir" : "./",
"log_size" : "file",
"log_basename" : "rt-app",
"ftrace" : false,
"gnuplot" : false,
"io_device" : "/dev/null"
"mem_buffer_size" : 4194304,
"cumulative_slack" : false
}
**** tasks object ****
The tasks object is made of sub-objects that describe threads. No other kind
of object than thread object is allowed. The key value of each object will be
used as thread's name.
"tasks" : {
"thread_name1" : {
...
},
"thread_name2" : {
...
},
"thread_name3" : {
...
},
...
}
*** thread object ***
Thread object describes the behavior of one kind of thread which means that
several threads can be created with the same object (see instance parameter
below).
* instance : Integer. Define the number of threads that will be created with
the properties of this thread object. Default value is 1.
*** policy : String: Define the scheduling policy of the thread. default_policy
is used if not defined
* priority : Integer. Define the scheduling priority of the thread. The value
must be aligned with the range allowed by the policy. The default priority is
0 for sched_other class and 10 for other classes
* dl-runtime : Integer: Define the runtime budget for deadline scheduling class.
Default value is 0. The unit is usec.
* dl-period : Integer. Define the period duration for deadline scheduling class.
Default value is runtime. The unit is usec.
* dl-deadline : Integer. Define the deadline parameter for deadline scheduling
class. Default value is period. The unit is usec.
* cpus: Array of Integer. Define the CPU affinity of the thread. Default
value is all CPUs of the system. An example : "cpus" : [0, 2, 3]
* delay: Integer. Initial delay before a thread starts execution. The unit
is usec.
* phases: Object. The phases object described TBF
If there is only 1 phase, the sequence of events can be directly put in the
thread object instead of creating a phases object. As an example, the 2 objects
below are equals:
"task" : {
"thread1" : {
"phases" : {
"phase1" : {
"run" : 10,
"sleep" : 10
}
}
}
}
"task" : {
"thread1" : {
"run" : 10,
"sleep" : 10
}
}
"cpus" can be specified at task level or phase level. As an example, below
creates two threads. One thread will run with affinity of CPU 2 and 3. The
second task will run first phase with affinity to CPU 0 and 1, second phase with
affinity to CPU 2, and the third phase with affinity to CPU 4, 5, and 6 (it will
inherit the affinity from the task level):
"task" : {
"thread1" : {
"cpus" : [2,3],
"phases" : {
"phase1" : {
"run" : 10,
"sleep" : 10
}
}
}
"thread2" : {
"cpus" : [4,5,6],
"phases" : {
"phase1" : {
"cpus" : [0,1],
"run" : 10,
"sleep" : 10
}
"phase2" : {
"cpus" : [2],
"run" : 10,
"sleep" : 10
}
"phase3" : {
"run" : 10,
"sleep" : 10
}
}
}
}
* loop: Integer. Define the number of times the parent object must be run.
The parent object can be a phase or a thread. For phase object, the loop
defines the number of time the phase will be executed before starting the next
phase. For thread object, the loop defines the number of times that the
complete "phases" object will be executed. The default value is -1.
*** events ***
events are simple action that will be performed by the thread or on the
thread. They have to be listed by execution order.
* run : Integer. Emulate the execution of a load. The duration is defined in
usec but the run event will effectively run a number of time a loop that waste
cpu cycles. When the run event is executed, the duration is transformed in a
number of loop thanks to the ns per loop value (see calibration). This way of
working enables to emulate a load with a duration that will vary with the
frequency or the compute capacity of the CPU.
* runtime : Integer. The duration is define in usec. Similar to the
run event, it emulates the execution of a load. Unlike run, runtime
runs for a specific amount of time irrespective of the compute
capacity of the CPU or the frequency.
* sleep : Integer. Emulate the sleep of a task. The duration is defined in
usec.
* mem : Integer. Emulate the memory write operation. The value defines the size
in byte to be written into the memory buffer. The size of the memory buffer is
defined by "mem_buffer_size" in "global" object.
* iorun : Integer. Emulate the IO write operation. The value defined the size
in byte to be write into the IO device specified by "io_device" in "global"
object.
* timer : Object. Emulate the wake up of the thread by a timer. Timer differs
from sleep event by the start time of the timer duration. Sleep duration starts
at the beginning of the sleep event whereas timer duration starts at the end of
the last use of the timer. So Timer event are immunized against preemption,
frequency scaling and computing capacity of a CPU. The initial starting time of
the timer is set during the 1st use of the latter.
|<------19---------->|<------19---------->|<------19---------->|
task A ...|run 5|timer 19 |------run 5|timer 19|run 5|timer 19 |
task B |run 10 |
As an example to point out the difference between sleep and timer, let consider
a task A that run 5 and then sleep 10. The period of task A should be 15.
Let's add a task B that runs 5 and use a timer to wakes up every 19.
|<------15------>|<------15------>|<------19---------->|<------15------>|<------16------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|
taskA ...|run 5|sleep 10 |run 5|sleep 10 |----run 5|sleep 10 |run 5|sleep 10 |-run 5|sleep 10 |----run 5|sleep 10 |----run 5|sleep 10 |----run 5|sleep 10 |----run 5|sleep 10 |----run 5|sleep 10 |
taskB ...|------run 5|timer 19|--run 5|timer 19 |run 5|timer 19 |run 5|timer 19 |run 5|timer 19 |run 5|timer 19 |run 5|timer 19 |run 5|timer 19 |run 5|timer 19 |
|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|<------19---------->|
We can see that task B period stays to 19 even if the run event is delayed
because of scheduling preemption whereas the period of task A starts at 15 but
increases to 19 because of the scheduling delay.
"unique" timer device: When a thread that uses a timer is instanciated, the
timer will be shared across the thread instance which disturbs the original
sequence. In order to have 1 timer per instance, you can use the "unique"
prefix so a new timer will be created for each instance. Uniqueness
applies to the thread boundary which means that using the same unique
name in the sequence of a thread will refer to the same timer like the example
below:
"phases" : {
"light" : {
"loop" : 10,
"run" : 1000,
"timer" : { "ref" : "unique", "period" : 30000 }
},
"heavy" : {
"loop" : 10,
"run" : 4000,
"timer" : { "ref" : "unique", "period" : 30000 }
}
}
If you want to refer to different timer you must use different name like
below:
"phases" : {
"light" : {
"loop" : 10,
"run" : 1000,
"timer" : { "ref" : "uniqueA", "period" : 30000 }
},
"heavy" : {
"loop" : 10,
"run" : 4000,
"timer" : { "ref" : "uniqueB", "period" : 400000 }
}
}
Timers can work with a "relative" or an "absolute" reference. By default they
work in "relative" mode, but this mode can also be explicity specified as the
following:
"phases" : {
"phase0" : {
"loop" : 10,
"run0" : 10000,
"timer0" : { "ref" : "unique", "period" : 20000, "mode" : "relative" },
}
"relative" mode means that the reference for setting the next timer event is
relative to the end of the current phase. This in turn means that if, for some
reason (i.e., clock frequency was too low), events in a certain phase took too
long to execute and the timer of that phase couldn't actually fire at all, the
next phase won't be affected. For example:
+---- + +-----+ +-------------------+-----+ +---
|r0 | |r0 | |r0 |r0 | |r0
| | | | | | | |
o-----------o-----------o-------------------o-----------o------->
0 10 20 30 40 50 60 70 80 100 120
^ ^ ^ ^
| | | MISS! |
+ + + +
Timer0 Timer0 Timer0 Timer0
In this example character "o" denotes when phases finish/start. Third
activation of Timer0 is missed, since r0 executed for more that 20ms. However
the next phase is not affected as Timer0 was set considering the instant of
time when the misbehaving r0 finished executing.
"absolute" mode is specified as the following:
"phases" : {
"phase0" : {
"loop" : 10,
"run0" : 10000,
"timer0" : { "ref" : "unique", "period" : 20000, "mode" : "absolute" },
}
"absolute" mode means that the reference for setting the next timer event is
fixed and always consider the starting time of the first phase. This means that
if, for some reason (i.e., clock frequency was too low), events in a certain
phase took too long to execute and the timer of that phase couldn't actually
fire at all, the next phase (and potentially other subsequent phases) _will_ be
affected. For example, considering again the example above:
+---- + +-----+ +-------------------+-----+-----+ +---
|r0 | |r0 | |r0 |r0 |r0 | |r0
| | | | | | | | |
o-----------o-----------o-------------------o-----o---------o---->
0 10 20 30 40 50 60 70 80 100 120
^ ^ ^ ^ ^
| | | MISS! | MISS! |
+ + + + +
Timer0 Timer0 Timer0 Timer0 Timer0
Third activation of Timer0 is missed, since r0 executed for more that 20ms.
Even if 4th activation of r0 executes for 10ms (as specified in the
configuration), 4th Timer0 is still missed because the reference didn't change.
In this example 5th activation of r0 then managed to recover, but in general it
depends on how badly a certain phase misbehaves.
* lock : String. Lock the mutex defined by the string value.
* unlock : String. Unlock the mutex defined by the string value.
* wait : Object {"ref" : String, "mutex" : String }. Block the calling thread
until another thread sends a wake up signal to the resource defined by "ref".
The mutex defined by "mutex" is used during the block sequence: See
pthread_cond_wait() for more details about the sequence (especially regarding
the use of the mutex).
* signal : String. Send a wake up signal to the resource defined by the
string. The 1st thread in the wait list will be wokn up. See
pthread_cond_signal() for more details.
* broad : String. Send a wake up signal to the resource defined by the
string. All threads that are blocked on the resource wil wake up. See
pthread_cond_broadcast() for more details.
* sync : Object {"ref" : String, "mutex" : String }. Atomically wakes
up a blocked thread and then blocks the calling thread on the condition.
taskA ...|run 5|wait|------------|run 5| wait |----------------
taskB ...-------------------|sync|--------
The sync event "sync" : {"ref" : "CondA", "mutex" : "mutexA" } generates the
following sequence:
{
"lock" : "mutexA",
"signal" : "CondA",
"wait" : { "ref" : "condA", "mutex" : "mutexA" },
"unlock" : "mutexA"
}
* barrier : String. Used as at least a pair where the name must match.
Any number of matching uses will cause all threads hitting the barrier event
to wait for a signal. The number of users is recorded, so that when the last user
hits the barrier event, that thread will broadcast and continue to the next
step. This is conceptually exactly the same as a pthread_barrier_wait operation
however, using a pthread_barrier would impose some strict conditions on usage
around thread cleanups - primarily that you cannot cancel an in-progress barrier
operation which would mean that we have to restrict cleanup to only be possible
at the end of a loop cycle (i.e. all phases are complete). This would be too
restrictive for most uses so here we use an alternative.
In this implementation, the barrier event manages its own mutex and uses a
variable shared between users to track waiting tasks, protected by the mutex.
You must use a unique name for each sync event since the number of users is
taken from the number of references to the name in the input json. i.e. each
name must represent a single sync point, and be shared amongst all threads
which wish to syncronise at that point.
The barrier event "barrier" : "SyncPointA"
generates the following sequence:
{
"lock" : "SyncPointA" (internal mutex),
If the shared variable is 0:
"signal" : "SyncPointA" (internal condvar),
Else:
decrement the shared variable,
"wait" : { "ref" : "SyncPointA" (internal condvar),
"mutex" : "SyncPointA" (internal mutex) },
increment the shared variable,
"unlock" : "SyncPointA" (internal mutex)
}
* suspend : String. Block the calling thread until another thread wakes it up
with resume. The String can be let empty as it will be filled by workgen with
the right thread's name before starting the use case.
* resume : String. Wake up the thread defined by the string.
taskA ...|run 5|suspend |----------------|run 5|suspend |----------------
taskB ...-------------------|resume taskA|run 10 |--------------------
* yield: String. Calls pthread_yield(), freeing the CPU for other tasks. This has a
special meaning for SCHED_DEADLINE tasks. String can be empty.
**** Trace and Log ****
Some traces and log hooks have been added to ease the debug and monitor various
metrics of the use cases
*** Trace ***
A trace can be inserted into ftrace buffer for each event in order to sync
kernel events like sched_switch with the use case's sequence.
A simple example of ftrace log:
thread0-23345 [003] 115591.560884: tracing_mark_write: [0] starts
thread0-23345 [003] 115591.560890: tracing_mark_write: [0] begins loop 0 phase 0 step 0
thread0-23345 [003] 115591.560894: tracing_mark_write: [0] executing 0
thread0-23345 [003] 115591.580212: tracing_mark_write: [0] executing 1
thread0-23345 [003] 115591.580217: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
<idle>-0 [003] 115591.670198: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120
thread0-23345 [003] 115591.670243: tracing_mark_write: [0] end loop 0 phase 0 step 0
thread0-23345 [003] 115591.670251: tracing_mark_write: [0] begins loop 0 phase 0 step 1
thread0-23345 [003] 115591.670254: tracing_mark_write: [0] executing 0
thread0-23345 [003] 115591.688081: tracing_mark_write: [0] executing 1
thread0-23345 [003] 115591.688085: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
<idle>-0 [003] 115591.778063: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120
thread0-23345 [003] 115591.778108: tracing_mark_write: [0] end loop 0 phase 0 step 1
thread0-23345 [003] 115591.778116: tracing_mark_write: [0] begins loop 0 phase 0 step 2
thread0-23345 [003] 115591.778119: tracing_mark_write: [0] executing 0
thread0-23345 [003] 115591.794619: tracing_mark_write: [0] executing 1
thread0-23345 [003] 115591.794623: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
<idle>-0 [003] 115591.884607: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120
thread0-23345 [003] 115591.884652: tracing_mark_write: [0] end loop 0 phase 0 step 2
...
thread0-23345 [003] 115593.394022: tracing_mark_write: [0] begins loop 0 phase 0 step 17
thread0-23345 [003] 115593.394025: tracing_mark_write: [0] executing 0
thread0-23345 [003] 115593.410583: tracing_mark_write: [0] executing 1
thread0-23345 [003] 115593.410594: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
<idle>-0 [003] 115593.500567: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120
thread0-23345 [003] 115593.500612: tracing_mark_write: [0] end loop 0 phase 0 step 17
thread0-23345 [003] 115593.500620: tracing_mark_write: [0] begins loop 0 phase 0 step 18
thread0-23345 [003] 115593.500623: tracing_mark_write: [0] executing 0
thread0-23345 [003] 115593.520198: tracing_mark_write: [0] executing 1
thread0-23345 [003] 115593.520202: sched_switch: prev_comm=thread0 prev_pid=23345 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
<idle>-0 [003] 115593.610185: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=thread0 next_pid=23345 next_prio=120
thread0-23345 [003] 115593.610230: tracing_mark_write: [0] end loop 0 phase 0 step 18
thread0-23345 [003] 115593.610258: tracing_mark_write: [0] exiting
The main events are :
* "tracing_mark_write: [0] starts" indicates the start of the thread with index 0
* "tracing_mark_write: [0] begins loop A phase B loop C" indicates that the
thread with index 0 starts to execute the loop C of the phase B of the main
loop A
* "tracing_mark_write: [0] executing X": indicates that the thread with index 0
execute the event X of the current phase
-"tracing_mark_write: [0] end loop A phase B step C" indicates that the
thread with index 0 ends to execute the event C of the phase B of the loop A
*"tracing_mark_write: [0] exiting": indiscates that the trhead with index 0 has
finished to execute its events.
*** Log and gnuplot ***
You can log some metrics for each phase that is executed by a thread. These
metrics are:
- perf: number of loop that has been executed by run events
- run: time spent by the thread to execute the run events
- period: duration to execute the complte phase
- start/end : absolute start and end time of a phase. Same time base is used in
ftrace
- rel_st: start time of a phase relatively to the beg of the use case
- slack: if global option "cumulative_slack" (see above) is false, time between
the end of last event and the end of the phase, e.g.
taskA ...|-- run5 --|- sleep5 -|-- run5--|..timer20.|-- run5 --|- sleep5 -|-- run6 --|.timer20.|
<--------------- period 20 --------------> <--------------- period 20 -------------->
<-slack5-> <slack4->
it can also be negative if the execution of a phases' events overshoots the
current period, e.g.
taskA ...|-- run5 --|- sleep5 -|------- run30 ------xxxxxxxxxx|
<--------------- period 20 -------------->
<slack-5>
if global option "cumulative_slack" is true, all the intermediate slacks of a
phase with multiple timers are accumulated and reported when the phase
completes
- c_duration: sum of the configured duration of run/runtime events
- c_period: sum of the timer(s) period(s)
- wu_lat: sum of wakeup latencies after timer events
Below is an extract of a log:
# Policy : SCHED_OTHER priority : 0
#idx perf run period start end rel_st slack c_duration c_period wu_lat
0 92164 19935 98965 504549567051 504549666016 2443 78701 20000 100000 266
0 92164 19408 99952 504549666063 504549766015 101455 80217 20000 100000 265
0 92164 19428 99952 504549766062 504549866014 201454 80199 20000 100000 264
0 92164 19438 99955 504549866060 504549966015 301452 80190 20000 100000 265
0 92164 19446 99952 504549966061 504550066013 401453 80093 20000 100000 264
0 92164 19415 99953 504550066060 504550166013 501452 80215 20000 100000 263
0 92164 19388 99954 504550166059 504550266013 601451 80242 20000 100000 264
0 92164 19444 99956 504550266060 504550366015 701452 80185 20000 100000 265
Some gnuplot files are also created to generate charts based on the log files
for each thread and for each kind of metrics. The format of the chart that
will be generated by gnuplot is eps (text art has been used for the chart
below)
Measured thread0 Loop stats
Load [nb loop]
120000 ++--+----+---+----+---+---+----+---+----+--++ 506000 load ******
+ + + + + + + + + + + run ######
|$$ : : : : $$$ : : : : | period $$$$$$
110000 ++.$$$........$$$$...$...$..........$$$.$$$$+
| : $$$$$$$$ $$ $: $ $$$$$$$$ $: $
| : : : : $ : :$ : : : ++ 504000
| : : : : : : : : : |
100000 ++.........................................++
| : : : : : : : : : |
| : : : : : : : : : |
90000 ++.........................................++
| : : : : : : : : : ++ 502000
| : : : : : : : : : |
80000 ++.........................................++
| : : : : : : : : : |
| : : : : : : : : : |
70000 +******************************************** 500000
| : : : : : : : : : |
| : : : : : : : : : |
| : : : : : : : : : |
60000 ++.........................................++
| : : : : : : : : : |
| : : : : : : : : : ++ 498000
50000 ++.........................................++
| : : : : : : : : : |
| : : : : : : : : : |
40000 ++.........................................++
| : : : : : : : : : ++ 496000
|# : : : : : : : : : |
| # : : : : ### : : : : |
30000 ++.###.###....####...#...#..........###.####+
| : # :#### ### : # ######## #: #
+ + + + + + +# + + + +
20000 ++--+----+---+----+---+---+----+---+----+--++ 494000
17560556057560556057560656065606756065606756075607
Loop start time [msec]
|