tests: Utility to minimize test cases, continued #527

Merged
wetneb merged 27 commits from minimize into main 2025-09-11 23:47:23 +02:00
Owner

supersedes #469

this time I opened the branch on the main repo, so that we could collaborate on this if needed

WIP that will eventually close #456.

Warning: it's ugly! Lots of code duplication. No unit tests.

I am not sure what is the right place for this sort of code (if we want to have it at all in this repo :-D ). Intuitively, mgf_dev is quite fitting, but we need access to a lot of Mergiraf internals which have only been pub(crate) so far, and we don't have access to that as things stand. Making them fully public is possible, and after all we don't claim to maintain any sort of stability for the Rust API, so maybe it's not a big deal. But if we wanted to offer some sort of public interface in the future, this would probably make it harder.

Arguably, one could also put the code in the root crate, and only expose the minimization function publicly, which could then be used by mgf_dev. For the main mergiraf binary, it would be dead code, so it wouldn't make it into the binary, but probably still inflate the compilation time a bit, and our test coverage would get a big hit (we could have a big integration test that covers most things, but it's a bit meh).

supersedes #469 this time I opened the branch on the main repo, so that we could collaborate on this if needed WIP that will eventually close #456. **Warning**: it's ugly! Lots of code duplication. No unit tests. I am not sure what is the right place for this sort of code (if we want to have it at all in this repo :-D ). Intuitively, `mgf_dev` is quite fitting, but we need access to a lot of Mergiraf internals which have only been `pub(crate) `so far, and we don't have access to that as things stand. Making them fully public is possible, and after all we don't claim to maintain any sort of stability for the Rust API, so maybe it's not a big deal. But if we wanted to offer some sort of public interface in the future, this would probably make it harder. Arguably, one could also put the code in the root crate, and only expose the minimization function publicly, which could then be used by `mgf_dev`. For the main `mergiraf` binary, it would be dead code, so it wouldn't make it into the binary, but probably still inflate the compilation time a bit, and our test coverage would get a big hit (we could have a big integration test that covers most things, but it's a bit meh).
removes the need to expose `to_merged_text`.. kind of
clippy
All checks were successful
/ test (pull_request) Successful in 1m39s
b60e47025e
add cargo alias
All checks were successful
/ test (pull_request) Successful in 44s
1f6eaa3149
ada4a changed title from WIP: tests: Utility to minimize test cases, continued to tests: Utility to minimize test cases, continued 2025-08-31 12:42:03 +02:00
Author
Owner

I think this is good enough as a start -- should we merge it?

I think this is good enough as a start -- should we merge it?
Author
Owner

Oh, wait, there are some things to fix up -- give me a sec

Oh, wait, there are some things to fix up -- give me a sec
Author
Owner

Ok should be good to go now

Ok should be good to go now
Author
Owner

@wetneb wrote in #576 (comment):

The script should just call mergiraf on the example. You can then pass the expected status code (134, if I remember well?) to the minimizer.

Sorry, still not quite following you. Looking at the source code, I think the testing command invocation ends up looking like:

bash -c <script> testing_script <output directory>

Firstly, what does testing_script in the middle of the invocation achieve?.. I thought you'd want to pass to <script> the <output_directory>, not this literal string?

Secondly, what I understand "just call mergiraf on the example" to mean is to run mergiraf merge? But doesn't that require me to know what the paths to base, left, and right files are (most importantly, their extensions)? Knowing <output_directory> only brings me half-way.

The purpose of having a configurable script is for the cases where you want to ensure some additional properties on the input files or the merge output, such as the presence of a conflict or of a certain type of content.

I'm completely blanking on this one... I think having a couple of example mgf_dev minimize invocations would really go a long way

@wetneb wrote in https://codeberg.org/mergiraf/mergiraf/pulls/576#issuecomment-6832207: > The script should just call mergiraf on the example. You can then pass the expected status code (134, if I remember well?) to the minimizer. Sorry, still not quite following you. Looking at the source code, I think the testing command invocation ends up looking like: ``` bash -c <script> testing_script <output directory> ``` Firstly, what does `testing_script` in the middle of the invocation achieve?.. I thought you'd want to pass to `<script>` the `<output_directory>`, not this literal string? Secondly, what I understand "just call mergiraf on the example" to mean is to run `mergiraf merge`? But doesn't that require me to know what the paths to base, left, and right files are (most importantly, their extensions)? Knowing `<output_directory>` only brings me half-way. > The purpose of having a configurable script is for the cases where you want to ensure some additional properties on the input files or the merge output, such as the presence of a conflict or of a certain type of content. I'm completely blanking on this one... I think having a couple of example `mgf_dev minimize` invocations would really go a long way
Owner

Sure, here's an attempt at documenting it:

First, compile mergiraf on the bundle-comments branch (say, in release mode, so that the minimizing goes faster). Let's call that binary faulty_mergiraf.

Then let's get the example:

wget https://codeberg.org/attachments/8f69e268-0c70-4bc7-8a12-ed2773800677 -O 97dbf903e1.tgz
tar -zxvf 97dbf903e1.tgz

Then, write a script like this:

#!/bin/bash
set -e
./faulty_mergiraf merge $1/Base.hs $1/Left.hs $1/Right.hs

Let's call this script run_faulty.sh.

Let's check that running the script on the example produces a return code indicating a panic (134):

chmod +x run_faulty.sh
./run_faulty.sh 97dbf903e1/
echo $?

The faulty script successfully fails on the example, yay! At each minimization step, the minimizer will check that the script has this status code and will reject any minimized version of the test case that doesn't pass this test. This makes it possible to minimize the test case under that particular constraint.

Let's do it then:

cargo run -p mgf_dev -- minimize -e 134 97dbf903e1/ ./run_faulty.sh

Note: running this on the current version of this branch seems to run into an infinite loop. From my original version it seems to work.

In such a simple case, writing a script is arguably a bit overkill, so you could just do instead:

cargo run -p mgf_dev -- minimize -e 134 97dbf903e1/ "./faulty_mergiraf merge $1/Base.hs $1/Left.hs $1/Right.hs"`

At the end you have a minimized test case at /tmp/minimized.

Ideas for improvement:

  • make the minimizer run the provided script on the original example, without any change, to check that the expected return code is found. If not, print the error output of the program to help the user debug their testing script
  • make the current version of this branch also work (I haven't looked into what goes wrong)
Sure, here's an attempt at documenting it: First, compile mergiraf on the `bundle-comments` branch (say, in release mode, so that the minimizing goes faster). Let's call that binary `faulty_mergiraf`. Then let's get the example: ``` wget https://codeberg.org/attachments/8f69e268-0c70-4bc7-8a12-ed2773800677 -O 97dbf903e1.tgz tar -zxvf 97dbf903e1.tgz ``` Then, write a script like this: ```bash #!/bin/bash set -e ./faulty_mergiraf merge $1/Base.hs $1/Left.hs $1/Right.hs ``` Let's call this script `run_faulty.sh`. Let's check that running the script on the example produces a return code indicating a panic (134): ``` chmod +x run_faulty.sh ./run_faulty.sh 97dbf903e1/ echo $? ``` The faulty script successfully fails on the example, yay! At each minimization step, the minimizer will check that the script has this status code and will reject any minimized version of the test case that doesn't pass this test. This makes it possible to minimize the test case under that particular constraint. Let's do it then: ``` cargo run -p mgf_dev -- minimize -e 134 97dbf903e1/ ./run_faulty.sh ``` Note: running this on the current version of this branch seems to run into an infinite loop. From my original version it seems to work. In such a simple case, writing a script is arguably a bit overkill, so you could just do instead: ``` cargo run -p mgf_dev -- minimize -e 134 97dbf903e1/ "./faulty_mergiraf merge $1/Base.hs $1/Left.hs $1/Right.hs"` ``` At the end you have a minimized test case at `/tmp/minimized`. Ideas for improvement: * make the minimizer run the provided script on the original example, without any change, to check that the expected return code is found. If not, print the error output of the program to help the user debug their testing script * make the current version of this branch also work (I haven't looked into what goes wrong)
Owner

Funnily enough, in this particular case, the minimizer does an absolutely excellent job: it minimized the test case to the point that all three revisions (base, left and right) are completely empty, and the panic still happens! Not sure it really helps the debugging that much, because it's a bit degenerate, but hey, it's pretty good minimization :-D

Ah no nevermind I made a mistake - there is no panic for empty files.

<s>Funnily enough, in this particular case, the minimizer does an absolutely excellent job: it minimized the test case to the point that all three revisions (base, left and right) are completely empty, and the panic still happens! Not sure it really helps the debugging that much, because it's a bit degenerate, but hey, it's pretty good minimization :-D </s> Ah no nevermind I made a mistake - there is no panic for empty files.
Author
Owner

Thank you very much, that was very helpful:)

Note: running this on the current version of this branch seems to run into an infinite loop. From my original version it seems to work.

Huh. Looking at the rebase diff, nothing really catches the eye.. I guess I'll try to bisect this

Thank you very much, that was very helpful:) > Note: running this on the current version of this branch seems to run into an infinite loop. From my original version it seems to work. Huh. Looking at the rebase diff, nothing really catches the eye.. I guess I'll try to bisect this
Contributor

Okay, I now understood your explanation from the other thread. I guess if the result parses with treesitter we should be relatively alright? Succesfully compiling real world examples will be hard to impossible to automate. What we could try is running ghc on the files. GHC has multiple phases, first the parser, later renaming, then typechecking. We can probably check that ghc reaches the renamer (where name resolution happens and certainly will fail.) Sounds also a bit brittle. But feel free to ping me when you need more help in this regard.

Okay, I now understood your explanation from the other thread. I guess if the result parses with treesitter we should be relatively alright? Succesfully compiling real world examples will be hard to impossible to automate. What we could try is running ghc on the files. GHC has multiple phases, first the parser, later renaming, then typechecking. We can probably check that ghc reaches the renamer (where name resolution happens and certainly will fail.) Sounds also a bit brittle. But feel free to ping me when you need more help in this regard.
Author
Owner

Yeah I think that's as far as we can go with ghc -- but still, it sounds like a good additional step on top of the tree-sitter parsing, thanks for the tip @maralorn:)

We could maybe start collecting these language-specific testing scripts somewhere in the repo -- not sure whether we'll end up with a lot of them, but they could serve as a kind of documentation examples for future contributors.

Yeah I think that's as far as we can go with ghc -- but still, it sounds like a good additional step on top of the tree-sitter parsing, thanks for the tip @maralorn:) We could maybe start collecting these language-specific testing scripts somewhere in the repo -- not sure whether we'll end up with a lot of them, but they could serve as a kind of documentation examples for future contributors.
Add a simple integration test
All checks were successful
/ test (pull_request) Successful in 37s
0ac0556ed8
wetneb approved these changes 2025-09-11 23:14:12 +02:00
wetneb left a comment
Owner

I propose to finally merge this :)

I propose to finally merge this :)
ada4a left a comment
Author
Owner

one last thing^^

one last thing^^
src/minimize.rs Outdated
@ -0,0 +496,4 @@
}
}
"#
.trim()
Author
Owner

Maybe trim the minimized files instead (if that's even needed?)? The expected files we do control, so it's a bit strange to leave them untrimmed imo

Maybe trim the minimized files instead (if that's even needed?)? The expected files we do control, so it's a bit strange to leave them untrimmed imo
Owner

Hmmm the formatter made this look quite ugly… well I might as well do all the trimming manually

Hmmm the formatter made this look quite ugly… well I might as well do all the trimming manually
Author
Owner

I went ahead and turned the strings into non-raw ones.. imo having the first line on its own line is worth the slight ugliness of escaped quotes.

EDIT: can't approve my1 own PR, so if you agree with this last-second change, feel free to merge^^


  1. well, that's what Codeberg considers it to be, anyway ↩︎

I went ahead and turned the strings into non-raw ones.. imo having the first line on its own line is worth the slight ugliness of escaped quotes. EDIT: can't approve my[^1] own PR, so if you agree with this last-second change, feel free to merge^^ [^1]: well, that's what Codeberg considers it to be, anyway
ada4a marked this conversation as resolved
wetneb merged commit 13f232a960 into main 2025-09-11 23:47:23 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#527
No description provided.