This patch adds a built-in profiler accessible from the monitor using the "profile" command. It's currently only used by the 6510sc-core, so should be considered a work-in-progress for now.
not implemented yet
xscpu64 (65C816)
xpet (w/ SuperPET 6809 CPU)
x128 (when running in Z80 mode)
Also, the profiler is not yet banking aware (only samples the 16-bit addresses the CPU sees), so will only display the correct disassembly when the bank configuration is compatible with the runtime one.
Here’s an open issue. It would be nice to support banking. For example, if the program during profiling swaps in the KERNAL ROM, executes a function (e.g. CHROUT), returns and swaps the ROM out, and then executes another function stored in the RAM under the KERNAL ROM it would be nice to differentiate between them and have the monitor output the correct disassembly depending on execution context.
In the current implementation, the execution context is defined purely based on the call stack (a sequence of 16-bit PC pointers) so the monitor doesn’t know that the ROM for example was swapped in during a specific execution execution.
There are two high-level approaches:
Augment the execution context information with the current memory bank configuration. This requires very little additional storage per profiling context (basically 1-2 bytes), but needs some handling for what to do when the bank configuration changes during a context. For the latter I’d probably simply create a linked-list structure of such contexts.
Augment every single PC pointer with bank information creating wide/far pointers. This could be a pretty disruptive change to the monitor, but has a few advantages. For example you could potentially support applying a label specifically to a ROM memory region (e.g. with 0x02 being the ROM bank prefix):
add_label 02FFD2 .CHROUT
Both of these approaches have challenges. I'd lean towards #1 as being the most straightforward to implement.
For this first one, there doesn’t seem to be a standardized way to query the current bank configuration. Most memory models have an (int) mem_config < NUM_CONFIGS <= 256, but there is no API to query this variable, and no API for the monitor to mem_peek_with_config(int mem_config, uint16_t addr). There is an API mem_bank_peek(int bank, uint16_t adds, ...), but that’s not really useful because the current bank number is both not queryable and is also not generally well-defined for a particular bank configuration.
So to implement this, I’d suggest maybe adding a
int mem_get_bank_config()
to each memory interface
For c64mem, that one would basically be a value 0-31:
This still doesn't address banking settings internal to cartridges, but that's starting to get pretty niche. If necessary, it could be handled by augmenting the config value like the hack below for the 65816 PB register...
for c128mem.c the config would be identical to what mmu_update_config() computes and things should work fine.
For scpu64mem it seems tricky to implement proper support without the monitor having 24-bit pointers. One potential option (or maybe hack) would be to effectively bake the program bank (PB) register into the memory config value (expanding it to 16 bits). This is in behavior actually similar to what the current monitor interface does (mem_bank_peek) by adding the current WDC65816_REGS_GET_PBR() value to the provided address, but that seems technically wrong: While the PB register makes sense for program memory (e.g. disassembly), for raw memory dumps you are just as likely (if not more) to want to use the DB register instead...
Anyway, this hack would work for 65816 profiling since the PB register never auto-increments during program execution (only by JML, JSL, RTL), so it's effectively constant for an execution context.
This being said, I'd consider holding off on supporting the 65816 until the monitor gets full 24-bit support. E.g. labels, memory and disassembly listings all lack 24-bit support today.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As an experiment I implemented approach 1 (for one of the machines). See the example below:
This program runs with the kernal ROM swapped out and a custom IRQ handler at $2403 (context [6])
When viewing context [6], the memory bank configuration (Bank) column shows two different bank configurations (5 and 6) in effect during execution. The ~middle 3 lines (with the call to context [7]) are running with configuration 6 – the KERNAL ROM is mapped in and the kernal routine at $FF9F (SCNKEY) is called.
When viewing context [7] at the bottom, the monitor correctly reads the disassembly from KERNAL ROM instead of RAM even though the ROM is currently swapped out.
What makes this less clean than ideal:
The memory bank config value is not the same as the already existing "bank" value. There are now monitor interfaces for both. This feels a bit overcomplicated, but I don't have a good solution.
The memory config value is currently unique to the profiling instructions. It would be nice if there was a more consistent/standardized way to refer to an unambiguous memory location in general. E.g. just like the current .C:xxxx prefix for memspace indicating the computer memory, there could maybe be a .C:yy:xxxx format for accessing ram under a specific memory bank configuration. This is where approach (2) in the previous message might have some advantages by creating a virtual wider global and non-overlapping address space. (Where each memory location has a unique address rather than the non-unique address you get with a memory configuration prefix)
The profiler (and monitor) now directly queries mem_get_current_bank_config() to capture bank switch changes. It's probably clearer for the memory subsystem to tell the profiler of any bank switches. This is an easy change.
Sorry for not coming back to this earlier... So whats the status on this? :)
(I also lean towards approach 1))
I think its acceptable to merge only for 6502, provided it works on all of them. There shouldn't be too many cases of banking either (if we ignore things like cartridges for a while, which are an entire different beast), depending on what needs to be done to support it, it might be OK to merge even if not all things are hooked up yet. Would be good to have a current patch to look at :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, likewise sorry for the delay. Here's the diff for the approach 1 implementation. This is incremental to the other two patches in this thread. If this looks ok I'm happy to quickly implement support in the same way for the other 6502-based machines.
Here's one single patch with everything. Should apply cleanly on the latest head. Configure with --enable-profiling. banking config support per above is only implemented on the x64 and x64sc cores, for the rest it's been stubbed-out (doesn't break anything, just makes profiling traces across bank changes a bit harder to read).
will have to play a bit with this... codestyle etc looks ok so far :) One thing that is missing in this patch would be the documentation update (vice.texi). Also, is it really necessary to have a configure switch for this? Does it have a notable performance impact even when profiling is disabled?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks – will add the documentation.
The compiler switch was mainly a precaution. I think the runtime perf impact of having it disabled but compiled in is relatively minimal but I haven't benchmarked it. Any tips/best practices for benchmarking vice emu performance?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I dont know to be honest... i'd say if it doesnt produce a noticable difference when running VICE in warpmode, then its just fine to have it unconditionally enabled.
I noted another small "problem" - there are a bunch of "printf modifier" style warnings in the monitor code now - please try to fix those (if you dont see them, we can do this later)
And i have another question - how does this deal with different CPUs? IE the disk drives? (its 3am here now, i didnt have time to actually check - will do tomorrow :=))
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I noted another small "problem" - there are a bunch of "printf modifier" style warnings in the monitor code now - please try to fix those (if you dont see them, we can do this later)
Interesting. My compiler didn't produce any warnings – I'll look into it.
And i have another question - how does this deal with different CPUs? IE the disk drives?
Currently it only profiles maincpu, other CPUs are not affected. In the future it wouldn't be terribly difficult to enable profiling of other CPUs – either by maintaining multiple parallel profiling contexts (one per CPU) or exclusive selection of one particular CPU to profile, but I didn't have an application for this and also no good test cases.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
another "problem" - the formatting of the "help" output in the monitor is a bit unfortunate - it seems to wrap around in the default monitor window size
Edit: that might be related to what font is being used(?) - probably a good idea to not use 80, but 79 or 78 chars width for the hardcoded formatting
Last edit: gpz 2023-04-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Another... guess you can call it request =) It would be very nice, if the profiler output would use the literal label names instead of addresses. Other than that, it all looks very nice to me!
I am running the testbench on x64sc now, just to see if this might break something.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for checking this out! I've attached a new version with the following changes:
Added vice.texi documentation for the monitor commands
Changed linebreak column for profiler help to ~76
I've not been able to get my compiler to emit any format string warnings (building using llvm/clang for arm64, -Wpedantic -Wformat-pedantic). Can you share what warnings you get?
I tested benchmarking in warp mode with and without the --enable-profiling flag for a workload. 3 repeats:
With: --disable-profiling
6.12 s
6.04 s
6.13 s
With: --enable-profiling
6.13
6.05
6.08
So the performance impact of enabling profiling support seems to be in the noise. If you think this is sufficient, I'm happy to remove the configure option.
I am running the testbench on x64sc now, just to see if this might break something.
That looks good....
Hmm, with profiling turned on from boot, 58 million samples collected in ~6 seconds, I got the virtually the same warp execution timings.
It should probably be enabled unconditionally then.... I'd much prefer this, since IF this patch breaks something, this will increase the chance someone notices it a lot
I've not been able to get my compiler to emit any format string warnings (building using llvm/clang for arm64, -Wpedantic -Wformat-pedantic). Can you share what warnings you get?
Working one something else right now... i will post them tomorrow or so. In any case, have a look at src/vice.h - clang can probably use similar attributes as gcc (VICE_ATTR_PRINTF and friends at the bottom of the file), so if you define those macros so clang recognizes them, you should get those warning too
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
(if you don't get these, don't bother, i'll fix them here before committing)
then, in scpu64mem.c the patch removed
#include"vice.h"
Please don't do that, every file must include this :)
The same applies to other .c files you added, they should first include vice.h, followed by the standard C headers (everything in <>), and after that the vice internal includes in alphabetical order.
And last not least, the newly added .c and .h files should have the same GPL header comment as other files.
After those changes i'll commit this, and i could also post in the csdb forum and have some ppl test it - it should come in handy for doing final touches at the demos for X :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks! Here's a new patch that should address all the above.
Sorry about the removed vice.h include. That was not intentional – my editor keeps warning that that header is unused and pops up correction suggestions and I must have accidentally accepted thinking I probably added the header myself to debug something...
Here's an additive patch that adds the non-sc 6502 core as well. With this the profiler should work for the main CPU on:
x64
x64sc
x64dtv
x128 (8502 CPU)
xplus4
xvic
xcbm2
xcbm5x0
xpet (6502 CPU)
vsid
not implemented yet
xscpu64 (65C816)
xpet (w/ SuperPET 6809 CPU)
x128 (when running in Z80 mode)
Also, the profiler is not yet banking aware (only samples the 16-bit addresses the CPU sees), so will only display the correct disassembly when the bank configuration is compatible with the runtime one.
Last edit: Oskar Linde 2023-01-16
very cool!
BTW, this should also get support for the remote interface (because at the end, you want to profile right inside vscode, right? :=))
Here’s an open issue. It would be nice to support banking. For example, if the program during profiling swaps in the KERNAL ROM, executes a function (e.g. CHROUT), returns and swaps the ROM out, and then executes another function stored in the RAM under the KERNAL ROM it would be nice to differentiate between them and have the monitor output the correct disassembly depending on execution context.
In the current implementation, the execution context is defined purely based on the call stack (a sequence of 16-bit PC pointers) so the monitor doesn’t know that the ROM for example was swapped in during a specific execution execution.
There are two high-level approaches:
Augment the execution context information with the current memory bank configuration. This requires very little additional storage per profiling context (basically 1-2 bytes), but needs some handling for what to do when the bank configuration changes during a context. For the latter I’d probably simply create a linked-list structure of such contexts.
Augment every single PC pointer with bank information creating wide/far pointers. This could be a pretty disruptive change to the monitor, but has a few advantages. For example you could potentially support applying a label specifically to a ROM memory region (e.g. with 0x02 being the ROM bank prefix):
Both of these approaches have challenges. I'd lean towards #1 as being the most straightforward to implement.
For this first one, there doesn’t seem to be a standardized way to query the current bank configuration. Most memory models have an (int) mem_config < NUM_CONFIGS <= 256, but there is no API to query this variable, and no API for the monitor to mem_peek_with_config(int mem_config, uint16_t addr). There is an API mem_bank_peek(int bank, uint16_t adds, ...), but that’s not really useful because the current bank number is both not queryable and is also not generally well-defined for a particular bank configuration.
So to implement this, I’d suggest maybe adding a
to each memory interface
For c64mem, that one would basically be a value 0-31:
(i.e. what’s computed by mem_pla_config_changed() today)
And also a (mostly) stateless
for monitor access.
This still doesn't address banking settings internal to cartridges, but that's starting to get pretty niche. If necessary, it could be handled by augmenting the config value like the hack below for the 65816 PB register...
for c128mem.c the config would be identical to what mmu_update_config() computes and things should work fine.
For scpu64mem it seems tricky to implement proper support without the monitor having 24-bit pointers. One potential option (or maybe hack) would be to effectively bake the program bank (PB) register into the memory config value (expanding it to 16 bits). This is in behavior actually similar to what the current monitor interface does (mem_bank_peek) by adding the current WDC65816_REGS_GET_PBR() value to the provided address, but that seems technically wrong: While the PB register makes sense for program memory (e.g. disassembly), for raw memory dumps you are just as likely (if not more) to want to use the DB register instead...
Anyway, this hack would work for 65816 profiling since the PB register never auto-increments during program execution (only by JML, JSL, RTL), so it's effectively constant for an execution context.
This being said, I'd consider holding off on supporting the 65816 until the monitor gets full 24-bit support. E.g. labels, memory and disassembly listings all lack 24-bit support today.
As an experiment I implemented approach 1 (for one of the machines). See the example below:
This program runs with the kernal ROM swapped out and a custom IRQ handler at $2403 (context [6])
When viewing context [6], the memory bank configuration (Bank) column shows two different bank configurations (5 and 6) in effect during execution. The ~middle 3 lines (with the call to context [7]) are running with configuration 6 – the KERNAL ROM is mapped in and the kernal routine at $FF9F (SCNKEY) is called.
When viewing context [7] at the bottom, the monitor correctly reads the disassembly from KERNAL ROM instead of RAM even though the ROM is currently swapped out.
What makes this less clean than ideal:
Last edit: Oskar Linde 2023-01-17
Sorry for not coming back to this earlier... So whats the status on this? :)
(I also lean towards approach 1))
I think its acceptable to merge only for 6502, provided it works on all of them. There shouldn't be too many cases of banking either (if we ignore things like cartridges for a while, which are an entire different beast), depending on what needs to be done to support it, it might be OK to merge even if not all things are hooked up yet. Would be good to have a current patch to look at :)
Hi, likewise sorry for the delay. Here's the diff for the approach 1 implementation. This is incremental to the other two patches in this thread. If this looks ok I'm happy to quickly implement support in the same way for the other 6502-based machines.
so i will have to apply all 3 patches?
Yes – but give me some time to rebase the patches onto the latest head and squash them.
ok. In that case you can also just make it one patch :)
Here's one single patch with everything. Should apply cleanly on the latest head. Configure with
--enable-profiling. banking config support per above is only implemented on the x64 and x64sc cores, for the rest it's been stubbed-out (doesn't break anything, just makes profiling traces across bank changes a bit harder to read).will have to play a bit with this... codestyle etc looks ok so far :) One thing that is missing in this patch would be the documentation update (vice.texi). Also, is it really necessary to have a configure switch for this? Does it have a notable performance impact even when profiling is disabled?
Thanks – will add the documentation.
The compiler switch was mainly a precaution. I think the runtime perf impact of having it disabled but compiled in is relatively minimal but I haven't benchmarked it. Any tips/best practices for benchmarking vice emu performance?
I dont know to be honest... i'd say if it doesnt produce a noticable difference when running VICE in warpmode, then its just fine to have it unconditionally enabled.
I noted another small "problem" - there are a bunch of "printf modifier" style warnings in the monitor code now - please try to fix those (if you dont see them, we can do this later)
And i have another question - how does this deal with different CPUs? IE the disk drives? (its 3am here now, i didnt have time to actually check - will do tomorrow :=))
Interesting. My compiler didn't produce any warnings – I'll look into it.
Currently it only profiles maincpu, other CPUs are not affected. In the future it wouldn't be terribly difficult to enable profiling of other CPUs – either by maintaining multiple parallel profiling contexts (one per CPU) or exclusive selection of one particular CPU to profile, but I didn't have an application for this and also no good test cases.
another "problem" - the formatting of the "help" output in the monitor is a bit unfortunate - it seems to wrap around in the default monitor window size
Edit: that might be related to what font is being used(?) - probably a good idea to not use 80, but 79 or 78 chars width for the hardcoded formatting
Last edit: gpz 2023-04-29
Ok, I'll reformat to a smaller column width. I don't think it was a problem on my platform.
Another... guess you can call it request =) It would be very nice, if the profiler output would use the literal label names instead of addresses. Other than that, it all looks very nice to me!
I am running the testbench on x64sc now, just to see if this might break something.
Label names should be supported. See sample below.
I also have another patch I can share that adds comments on branch statements in disassembly, e.g.
JSR $FF9F ; -> .SCNKEY
Thanks for checking this out! I've attached a new version with the following changes:
I've not been able to get my compiler to emit any format string warnings (building using llvm/clang for arm64, -Wpedantic -Wformat-pedantic). Can you share what warnings you get?
I tested benchmarking in warp mode with and without the --enable-profiling flag for a workload. 3 repeats:
So the performance impact of enabling profiling support seems to be in the noise. If you think this is sufficient, I'm happy to remove the configure option.
Hmm, with profiling turned on from boot, 58 million samples collected in ~6 seconds, I got the virtually the same warp execution timings.
That looks good....
It should probably be enabled unconditionally then.... I'd much prefer this, since IF this patch breaks something, this will increase the chance someone notices it a lot
Working one something else right now... i will post them tomorrow or so. In any case, have a look at src/vice.h - clang can probably use similar attributes as gcc (VICE_ATTR_PRINTF and friends at the bottom of the file), so if you define those macros so clang recognizes them, you should get those warning too
Thanks! I had to install GCC to get the warnings to show up. This latest patch should address them and also removes the configure option.
Ok some minor nitpicking left... first, i see two warnings still:
and
(if you don't get these, don't bother, i'll fix them here before committing)
then, in scpu64mem.c the patch removed
Please don't do that, every file must include this :)
The same applies to other .c files you added, they should first include vice.h, followed by the standard C headers (everything in <>), and after that the vice internal includes in alphabetical order.
And last not least, the newly added .c and .h files should have the same GPL header comment as other files.
After those changes i'll commit this, and i could also post in the csdb forum and have some ppl test it - it should come in handy for doing final touches at the demos for X :)
Thanks! Here's a new patch that should address all the above.
Sorry about the removed vice.h include. That was not intentional – my editor keeps warning that that header is unused and pops up correction suggestions and I must have accidentally accepted thinking I probably added the header myself to debug something...