tests: Utility to minimize test cases, continued #527

ada4a · 2025-07-23T17:34:05+02:00

ada4a commented

2025-07-23 17:34:05 +02:00

supersedes #469

this time I opened the branch on the main repo, so that we could collaborate on this if needed

WIP that will eventually close #456.

Warning: it's ugly! Lots of code duplication. No unit tests.

I am not sure what is the right place for this sort of code (if we want to have it at all in this repo :-D ). Intuitively, mgf_dev is quite fitting, but we need access to a lot of Mergiraf internals which have only been pub(crate) so far, and we don't have access to that as things stand. Making them fully public is possible, and after all we don't claim to maintain any sort of stability for the Rust API, so maybe it's not a big deal. But if we wanted to offer some sort of public interface in the future, this would probably make it harder.

Arguably, one could also put the code in the root crate, and only expose the minimization function publicly, which could then be used by mgf_dev. For the main mergiraf binary, it would be dead code, so it wouldn't make it into the binary, but probably still inflate the compilation time a bit, and our test coverage would get a big hit (we could have a big integration test that covers most things, but it's a bit meh).

supersedes #469 this time I opened the branch on the main repo, so that we could collaborate on this if needed WIP that will eventually close #456. **Warning**: it's ugly! Lots of code duplication. No unit tests. I am not sure what is the right place for this sort of code (if we want to have it at all in this repo :-D ). Intuitively, `mgf_dev` is quite fitting, but we need access to a lot of Mergiraf internals which have only been `pub(crate) `so far, and we don't have access to that as things stand. Making them fully public is possible, and after all we don't claim to maintain any sort of stability for the Rust API, so maybe it's not a big deal. But if we wanted to offer some sort of public interface in the future, this would probably make it harder. Arguably, one could also put the code in the root crate, and only expose the minimization function publicly, which could then be used by `mgf_dev`. For the main `mergiraf` binary, it would be dead code, so it wouldn't make it into the binary, but probably still inflate the compilation time a bit, and our test coverage would get a big hit (we could have a big integration test that covers most things, but it's a bit meh).

ada4a added 19 commits

2025-07-23 17:34:06 +02:00

Introduce minimize command dbcaa82f1b

Make max_failures and max_steps configurable a9b6c0cc2e

Look for nodes to delete in all three revisions e3b346ad4c

Flag to only delete nodes that are unchanged in all revisions ab7d594d52

move rand to workspace deps 8f157d320f

split args into multiple args a23a785642

restrict mutability of nodes_to_delete d3121f67cd

use choose 5dadce08b1

rm explicit short = ... for cli flags d669807f27

extract will_recurse 7d1fa725aa

cows b2593471fd

move detect_suffix to under src/utils.rs 75c362cc40

move read_file_to_string to under src/utils.rs 19fc2b381b

also move `write_string_to_file` while we're at it

use Option::as_deref ae12041b7e

inline passes_test af9d75de1b

everything is pretty clear even without it

invert if to get rid of negated condition 3c2e8eef8a

also reads a bit more naturally imo

use let-chains 29e0ae48fc

create pick_nodes_to_delete_internal 44d1b0d328

use actual logging

/ test (pull_request) Failing after 46s

Details

f60cb60fa3

ada4a added 3 commits

2025-07-23 19:12:24 +02:00

move minimize.rs to mergiraf-the-lib f93027490d

use pretty_print directly 7ece86601c

removes the need to expose `to_merged_text`.. kind of

clippy

/ test (pull_request) Successful in 1m39s

Details

b60e47025e

ada4a force-pushed minimize from b60e47025e to 0ba709127b

2025-07-24 19:38:04 +02:00

Compare

ada4a force-pushed minimize from 0ba709127b to 7dab7e1979

2025-07-24 19:39:41 +02:00

Compare

ada4a force-pushed minimize from 7dab7e1979 to a51b43b8bd

2025-07-24 19:42:37 +02:00

Compare

ada4a added 1 commit

2025-07-25 19:35:19 +02:00

add cargo alias

/ test (pull_request) Successful in 44s

Details

1f6eaa3149

ada4a force-pushed minimize from 1f6eaa3149 to 29fd1ecc2b

2025-07-26 15:01:30 +02:00

Compare

ada4a force-pushed minimize from 29fd1ecc2b to 6c285171d6

2025-08-01 14:14:44 +02:00

Compare

ada4a force-pushed minimize from 6c285171d6 to 74213cdbbb

2025-08-14 17:21:55 +02:00

Compare

ada4a changed title from ~~WIP: tests: Utility to minimize test cases, continued~~ to tests: Utility to minimize test cases, continued

2025-08-31 12:42:03 +02:00

ada4a commented

2025-08-31 12:42:46 +02:00

I think this is good enough as a start -- should we merge it?

ada4a commented

2025-08-31 12:52:11 +02:00

Oh, wait, there are some things to fix up -- give me a sec

ada4a force-pushed minimize from 74213cdbbb to 66b4b5bde8

2025-08-31 13:02:06 +02:00

Compare

ada4a commented

2025-08-31 13:44:34 +02:00

Ok should be good to go now

ada4a referenced this pull request

2025-08-31 13:45:01 +02:00

feat: bundle comments into nodes they annotate #576

ada4a referenced this pull request

2025-08-31 14:10:57 +02:00

feat: bundle comments into nodes they annotate #576

ada4a commented

2025-08-31 14:29:38 +02:00

@wetneb wrote in #576 (comment):

The script should just call mergiraf on the example. You can then pass the expected status code (134, if I remember well?) to the minimizer.

Sorry, still not quite following you. Looking at the source code, I think the testing command invocation ends up looking like:

bash -c <script> testing_script <output directory>

Firstly, what does testing_script in the middle of the invocation achieve?.. I thought you'd want to pass to <script> the <output_directory>, not this literal string?

Secondly, what I understand "just call mergiraf on the example" to mean is to run mergiraf merge? But doesn't that require me to know what the paths to base, left, and right files are (most importantly, their extensions)? Knowing <output_directory> only brings me half-way.

The purpose of having a configurable script is for the cases where you want to ensure some additional properties on the input files or the merge output, such as the presence of a conflict or of a certain type of content.

I'm completely blanking on this one... I think having a couple of example mgf_dev minimize invocations would really go a long way

@wetneb wrote in https://codeberg.org/mergiraf/mergiraf/pulls/576#issuecomment-6832207: > The script should just call mergiraf on the example. You can then pass the expected status code (134, if I remember well?) to the minimizer. Sorry, still not quite following you. Looking at the source code, I think the testing command invocation ends up looking like: ``` bash -c <script> testing_script <output directory> ``` Firstly, what does `testing_script` in the middle of the invocation achieve?.. I thought you'd want to pass to `<script>` the `<output_directory>`, not this literal string? Secondly, what I understand "just call mergiraf on the example" to mean is to run `mergiraf merge`? But doesn't that require me to know what the paths to base, left, and right files are (most importantly, their extensions)? Knowing `<output_directory>` only brings me half-way. > The purpose of having a configurable script is for the cases where you want to ensure some additional properties on the input files or the merge output, such as the presence of a conflict or of a certain type of content. I'm completely blanking on this one... I think having a couple of example `mgf_dev minimize` invocations would really go a long way

wetneb commented

2025-08-31 15:13:16 +02:00

Sure, here's an attempt at documenting it:

First, compile mergiraf on the bundle-comments branch (say, in release mode, so that the minimizing goes faster). Let's call that binary faulty_mergiraf.

Then let's get the example:

wget https://codeberg.org/attachments/8f69e268-0c70-4bc7-8a12-ed2773800677 -O 97dbf903e1.tgz
tar -zxvf 97dbf903e1.tgz

Then, write a script like this:

#!/bin/bash
set -e
./faulty_mergiraf merge $1/Base.hs $1/Left.hs $1/Right.hs

Let's call this script run_faulty.sh.

Let's check that running the script on the example produces a return code indicating a panic (134):

chmod +x run_faulty.sh
./run_faulty.sh 97dbf903e1/
echo $?

The faulty script successfully fails on the example, yay! At each minimization step, the minimizer will check that the script has this status code and will reject any minimized version of the test case that doesn't pass this test. This makes it possible to minimize the test case under that particular constraint.

Let's do it then:

cargo run -p mgf_dev -- minimize -e 134 97dbf903e1/ ./run_faulty.sh

Note: running this on the current version of this branch seems to run into an infinite loop. From my original version it seems to work.

In such a simple case, writing a script is arguably a bit overkill, so you could just do instead:

cargo run -p mgf_dev -- minimize -e 134 97dbf903e1/ "./faulty_mergiraf merge $1/Base.hs $1/Left.hs $1/Right.hs"`

At the end you have a minimized test case at /tmp/minimized.

Ideas for improvement:

make the minimizer run the provided script on the original example, without any change, to check that the expected return code is found. If not, print the error output of the program to help the user debug their testing script
make the current version of this branch also work (I haven't looked into what goes wrong)

Sure, here's an attempt at documenting it: First, compile mergiraf on the `bundle-comments` branch (say, in release mode, so that the minimizing goes faster). Let's call that binary `faulty_mergiraf`. Then let's get the example: ``` wget https://codeberg.org/attachments/8f69e268-0c70-4bc7-8a12-ed2773800677 -O 97dbf903e1.tgz tar -zxvf 97dbf903e1.tgz ``` Then, write a script like this: ```bash #!/bin/bash set -e ./faulty_mergiraf merge $1/Base.hs $1/Left.hs $1/Right.hs ``` Let's call this script `run_faulty.sh`. Let's check that running the script on the example produces a return code indicating a panic (134): ``` chmod +x run_faulty.sh ./run_faulty.sh 97dbf903e1/ echo $? ``` The faulty script successfully fails on the example, yay! At each minimization step, the minimizer will check that the script has this status code and will reject any minimized version of the test case that doesn't pass this test. This makes it possible to minimize the test case under that particular constraint. Let's do it then: ``` cargo run -p mgf_dev -- minimize -e 134 97dbf903e1/ ./run_faulty.sh ``` Note: running this on the current version of this branch seems to run into an infinite loop. From my original version it seems to work. In such a simple case, writing a script is arguably a bit overkill, so you could just do instead: ``` cargo run -p mgf_dev -- minimize -e 134 97dbf903e1/ "./faulty_mergiraf merge $1/Base.hs $1/Left.hs $1/Right.hs"` ``` At the end you have a minimized test case at `/tmp/minimized`. Ideas for improvement: * make the minimizer run the provided script on the original example, without any change, to check that the expected return code is found. If not, print the error output of the program to help the user debug their testing script * make the current version of this branch also work (I haven't looked into what goes wrong)

wetneb commented

2025-08-31 15:33:39 +02:00

Funnily enough, in this particular case, the minimizer does an absolutely excellent job: it minimized the test case to the point that all three revisions (base, left and right) are completely empty, and the panic still happens! Not sure it really helps the debugging that much, because it's a bit degenerate, but hey, it's pretty good minimization :-D

Ah no nevermind I made a mistake - there is no panic for empty files.

<s>Funnily enough, in this particular case, the minimizer does an absolutely excellent job: it minimized the test case to the point that all three revisions (base, left and right) are completely empty, and the panic still happens! Not sure it really helps the debugging that much, because it's a bit degenerate, but hey, it's pretty good minimization :-D </s> Ah no nevermind I made a mistake - there is no panic for empty files.

😆 1

ada4a commented

2025-08-31 15:36:07 +02:00

Thank you very much, that was very helpful:)

Note: running this on the current version of this branch seems to run into an infinite loop. From my original version it seems to work.

Huh. Looking at the rebase diff, nothing really catches the eye.. I guess I'll try to bisect this

Thank you very much, that was very helpful:) > Note: running this on the current version of this branch seems to run into an infinite loop. From my original version it seems to work. Huh. Looking at the rebase diff, nothing really catches the eye.. I guess I'll try to bisect this

maralorn commented

2025-08-31 16:30:39 +02:00

Okay, I now understood your explanation from the other thread. I guess if the result parses with treesitter we should be relatively alright? Succesfully compiling real world examples will be hard to impossible to automate. What we could try is running ghc on the files. GHC has multiple phases, first the parser, later renaming, then typechecking. We can probably check that ghc reaches the renamer (where name resolution happens and certainly will fail.) Sounds also a bit brittle. But feel free to ping me when you need more help in this regard.

ada4a commented

2025-08-31 16:43:07 +02:00

Yeah I think that's as far as we can go with ghc -- but still, it sounds like a good additional step on top of the tree-sitter parsing, thanks for the tip @maralorn:)

We could maybe start collecting these language-specific testing scripts somewhere in the repo -- not sure whether we'll end up with a lot of them, but they could serve as a kind of documentation examples for future contributors.

Yeah I think that's as far as we can go with ghc -- but still, it sounds like a good additional step on top of the tree-sitter parsing, thanks for the tip @maralorn:) We could maybe start collecting these language-specific testing scripts somewhere in the repo -- not sure whether we'll end up with a lot of them, but they could serve as a kind of documentation examples for future contributors.

ada4a force-pushed minimize from 66b4b5bde8 to 5b9fdae881

2025-09-09 21:40:02 +02:00

Compare

wetneb force-pushed minimize from 5b9fdae881 to 46e7b9596f

2025-09-11 23:08:53 +02:00

Compare

wetneb added 1 commit

2025-09-11 23:09:03 +02:00

Add a simple integration test

/ test (pull_request) Successful in 37s

Details

0ac0556ed8

wetneb approved these changes

2025-09-11 23:14:12 +02:00

wetneb left a comment

I propose to finally merge this :)

ada4a reviewed

2025-09-11 23:16:23 +02:00

ada4a left a comment

one last thing^^

src/minimize.rs Outdated

					
				@ -0,0 +496,4 @@

				    }

				}

				        "#

				            .trim()

ada4a commented

2025-09-11 23:14:43 +02:00

Maybe trim the minimized files instead (if that's even needed?)? The expected files we do control, so it's a bit strange to leave them untrimmed imo

wetneb commented

2025-09-11 23:19:59 +02:00

Hmmm the formatter made this look quite ugly… well I might as well do all the trimming manually

❤️ 1

ada4a commented

2025-09-11 23:29:21 +02:00

I went ahead and turned the strings into non-raw ones.. imo having the first line on its own line is worth the slight ugliness of escaped quotes.

EDIT: can't approve my¹ own PR, so if you agree with this last-second change, feel free to merge^^

well, that's what Codeberg considers it to be, anyway ↩︎

I went ahead and turned the strings into non-raw ones.. imo having the first line on its own line is worth the slight ugliness of escaped quotes. EDIT: can't approve my[^1] own PR, so if you agree with this last-second change, feel free to merge^^ [^1]: well, that's what Codeberg considers it to be, anyway

👍 1

ada4a marked this conversation as resolved