Broader test suite for benchmarks

wetneb commented

2025-07-09 10:26:58 +02:00

Owner

Our existing set of examples/ is focused on minimal, artificial test cases to check that a specific type of conflict resolution is supported.

For some types of changes, such as tweaks to the matching algorithms, it would be good to have an overview of the impact on real-world usage. During initial development, I've used the replication test suite of Spork for this. It's a set of real merge scenarios extracted from merge commits found in the wild. The expected merge file is set to be the one found in the merge commit. With such a dataset, one can run mergiraf on all cases and compare its output to the extracted ones. Then we can categorize those based on whether mergiraf returns:

conflicts, in which case its output is almost certainly different from the extracted file (unless merge conflicts were committed into the merge…)
a clean merge that's exactly identical to the extracted file. In this case we are obviously very happy.
a clean merge that's commutatively isomorphic to the extracted file. Perhaps the formatting isn't perfect, but we are still pretty happy
a clean merge that's different from the extracted file. This is the category we want to reduce as much as possible, even though this will include plenty of cases where mergiraf's output is actually fine.

I would like:

a big dataset containing extracted merge cases, like Spork's replication test suite but for any file types, and only containing the cases where line-based merging produces conflicts
a utility to benchmark mergiraf on this dataset, computing statistics about each of the categories above, broken down by file type. The helpers/suite.sh script is not far from that.
(cherry on the cake) a CI pipeline we can trigger on demand on PRs to produce a summary of the changes to the benchmark statics before and after the PR. Something that would look like this:

Glob	Total cases	Conflicts	Identical	Isomorphic	Different
`*.rs`	1815	+16 (+1%)	+0	+0	-16 (-1%)
`go.mod`	…	…	…	…	…
…	…	…	…	…	…
Total	…	…	…	…	…

I think this would be really helpful to tweak the matching heuristics (such as for #406 or #325), because I feel quite anxious about making changes to it without this feedback.

I'm working on the dataset collection, which I plan to do via https://www.softwareheritage.org/.

Our existing set of `examples/` is focused on minimal, artificial test cases to check that a specific type of conflict resolution is supported. For some types of changes, such as tweaks to the matching algorithms, it would be good to have an overview of the impact on real-world usage. During initial development, I've used [the replication test suite of Spork](https://github.com/ASSERT-KTH/spork/tree/master/replication) for this. It's a set of real merge scenarios extracted from merge commits found in the wild. The expected merge file is set to be the one found in the merge commit. With such a dataset, one can run mergiraf on all cases and compare its output to the extracted ones. Then we can categorize those based on whether mergiraf returns: * **conflicts**, in which case its output is almost certainly different from the extracted file (unless merge conflicts were committed into the merge…) * a clean merge that's **exactly identical** to the extracted file. In this case we are obviously very happy. * a clean merge that's **commutatively isomorphic** to the extracted file. Perhaps the formatting isn't perfect, but we are still pretty happy * a clean merge that's **different** from the extracted file. This is the category we want to reduce as much as possible, even though this will include plenty of cases where mergiraf's output is actually fine. I would like: 1. [ ] a big dataset containing extracted merge cases, like Spork's replication test suite but for any file types, and only containing the cases where line-based merging produces conflicts 2. [ ] a utility to benchmark mergiraf on this dataset, computing statistics about each of the categories above, broken down by file type. The `helpers/suite.sh` script is not far from that. 3. [ ] (cherry on the cake) a CI pipeline we can trigger on demand on PRs to produce a summary of the changes to the benchmark statics before and after the PR. Something that would look like this: | Glob | Total cases | Conflicts| Identical | Isomorphic | Different | |---------|---------|---------|---------|---------|---------| | `*.rs` | 1815| +16 (+1%) | +0 | +0 | -16 (-1%) | | `go.mod` | … | … | … | … | … | | … | … | … | … | … | … | | **Total** | … | … | … | … | … | I think this would be really helpful to tweak the matching heuristics (such as for #406 or #325), because I feel quite anxious about making changes to it without this feedback. I'm working on the dataset collection, which I plan to do via https://www.softwareheritage.org/.

wetneb added the

Kind

Testing

label

2025-07-09 10:27:08 +02:00

wetneb commented

2025-07-13 18:48:00 +02:00

Author

Owner

Other things that would be interesting to track:

the proportion of files where we fail to parse one of the source files (useful metric to track when upgrading a parser to a new version)
of course, performance metrics. Average time to merge? Or time to merge per kilobytes of input files? Or per bytes of conflicts? Not sure what's the most sensible thing.

Also, it would be good to be able to run the test suite only on certain file types (typically for parser updates or changes to specific language profiles).

Other things that would be interesting to track: * the proportion of files where we fail to parse one of the source files (useful metric to track when upgrading a parser to a new version) * of course, performance metrics. Average time to merge? Or time to merge per kilobytes of input files? Or per bytes of conflicts? Not sure what's the most sensible thing. Also, it would be good to be able to run the test suite only on certain file types (typically for parser updates or changes to specific language profiles).

wetneb self-assigned this

2025-07-16 01:20:02 +02:00

wetneb commented

2025-07-21 21:49:16 +02:00

Author

Owner

I'm slowly getting there! Here is a table summarizing a benchmark on a small dataset of real merge scenarios extracted from merge commits:

Language	Cases	Conflict	Exact	Format	Differ	Parse	Panic	Time (s)
`*.py`	815	550 (67%)	123 (15%)	45 (6%)	87 (11%)	7 (1%)	3 (0%)	0.187
`*.cpp`	461	156 (34%)	44 (10%)	14 (3%)	36 (8%)	207 (45%)	4 (1%)	0.129
`*.js`	1514	1129 (75%)	131 (9%)	52 (3%)	151 (10%)	48 (3%)	3 (0%)	0.731
`*.java`	327	181 (55%)	59 (18%)	18 (6%)	55 (17%)	14 (4%)	0	0.131
Total	3117	2016 (65%)	357 (11%)	129 (4%)	329 (11%)	276 (9%)	10 (0%)	0.437

Here's what the columns mean:

Cases: the total number of test cases considered. Line-based merging returns conflicts for all of them (using git merge-file with Myers diff)
Conflict: mergiraf also returns conflicts
Exact: mergiraf returns exactly the merge that is recorded in the merge commit (up to blank lines)
Format: mergiraf returns something slightly different, but commutatively isomorphic to the target file from the merge commit
Differ: mergiraf returns a conflict-free merge output, that isn't commutatively isomorphic to the target file
Parse: one of the source revisions failed to parse with the associated tree-sitter parser (so mergiraf fell back on line-based merging)
Panic: mergiraf panicked during the merge process
Time: the average duration of the mergiraf merge process, in seconds

So far the panics are all instances of #520, #521 or #333. It's interesting to see that almost half of the *.cpp files fail to parse (likely due to pre-processor usage). Also the *.js files take significantly longer - not sure if it's because they are often bigger, or if the parser is inefficient. Of course the numbers aren't so insightful on their own, the real deal will be observing the difference between two versions of mergiraf.

I'm still gathering a wider dataset of test cases (covering all file formats).

I'm slowly getting there! Here is a table summarizing a benchmark on a small dataset of real merge scenarios extracted from merge commits: | Language | Cases | Conflict | Exact | Format | Differ | Parse | Panic | Time (s) | | -------- | ----- | -------- | ----- | ------ | ------ | ----- | ----- | -------- | | `*.py` | 815 | 550 (67%) | 123 <span style="text-align:right">(15%)</span> | 45 (6%) | 87 (11%) | 7 (1%) | 3 (0%) | 0.187 | | `*.cpp` | 461 | 156 (34%) | 44 (10%) | 14 (3%) | 36 (8%) | 207 (45%) | 4 (1%) | 0.129 | | `*.js` | 1514 | 1129 (75%) | 131 (9%) | 52 (3%) | 151 (10%) | 48 (3%) | 3 (0%) | 0.731 | | `*.java` | 327 | 181 (55%) | 59 (18%) | 18 (6%) | 55 (17%) | 14 (4%) | 0 | 0.131 | | **Total** | 3117 | 2016 (65%) | 357 (11%) | 129 (4%) | 329 (11%) | 276 (9%) | 10 (0%) | 0.437 | Here's what the columns mean: * Cases: the total number of test cases considered. Line-based merging returns conflicts for all of them (using `git merge-file` with Myers diff) * Conflict: mergiraf also returns conflicts * Exact: mergiraf returns exactly the merge that is recorded in the merge commit (up to blank lines) * Format: mergiraf returns something slightly different, but commutatively isomorphic to the target file from the merge commit * Differ: mergiraf returns a conflict-free merge output, that isn't commutatively isomorphic to the target file * Parse: one of the source revisions failed to parse with the associated tree-sitter parser (so mergiraf fell back on line-based merging) * Panic: mergiraf panicked during the merge process * Time: the average duration of the `mergiraf merge` process, in seconds So far the panics are all instances of #520, #521 or #333. It's interesting to see that almost half of the `*.cpp` files fail to parse (likely due to pre-processor usage). Also the `*.js` files take significantly longer - not sure if it's because they are often bigger, or if the parser is inefficient. Of course the numbers aren't so insightful on their own, the real deal will be observing the difference between two versions of mergiraf. I'm still gathering a wider dataset of test cases (covering all file formats).

ada4a commented

2025-07-21 22:19:22 +02:00

Owner

Very impressive!

I think for the "Conflicts" column, it could make sense to weight each merge result by the amount of conflicts created – maybe Mergiraf solves all but one? I guess one could argue for doing the same for "Format" using something like tree-edit-distance, but that might come too close to an attempt to white-wash Mergiraf 😅

Very impressive! I think for the "Conflicts" column, it could make sense to weight each merge result by the amount of conflicts created – maybe Mergiraf solves all but one? I guess one could argue for doing the same for "Format" using something like tree-edit-distance, but that might come too close to an attempt to white-wash Mergiraf 😅

wetneb commented

2025-07-21 22:30:59 +02:00

Author

Owner

I think for the "Conflicts" column, it could make sense to weight each merge result by the amount of conflicts created – maybe Mergiraf solves all but one?

What sort of weighting are you thinking about?
For me, the number of conflicts solved isn't super important… as long as there are conflicts in the file, I expect that some human will have a look at the merged results to solve the remaining conflicts, and so if there are issues with the resolution of any conflicts, they are likely to get noticed in the same go.
Another issue is that there can be more conflicts than in the line-based file, because they're narrower. So we could also track the conflict mass, but then again I'm not sure how to report that in the table.

> I think for the "Conflicts" column, it could make sense to weight each merge result by the amount of conflicts created – maybe Mergiraf solves all but one? What sort of weighting are you thinking about? For me, the number of conflicts solved isn't super important… as long as there are conflicts in the file, I expect that some human will have a look at the merged results to solve the remaining conflicts, and so if there are issues with the resolution of any conflicts, they are likely to get noticed in the same go. Another issue is that there can be more conflicts than in the line-based file, because they're narrower. So we could also track the conflict mass, but then again I'm not sure how to report that in the table.

ada4a commented

2025-07-21 22:48:17 +02:00

Owner

Hm, my idea would be something like conflict_mass(structured_merge) / conflict_mass(line_based_merge), but I guess that's not guaranteed to be favorable to us, given that a fully structured merge can create just different conflicts, which can't really be compared to those created by a line-based merge

Hm, my idea would be something like `conflict_mass(structured_merge) / conflict_mass(line_based_merge)`, but I guess that's not guaranteed to be favorable to us, given that a fully structured merge can create just _different_ conflicts, which can't really be compared to those created by a line-based merge

wetneb commented

2025-07-21 23:26:45 +02:00

Author

Owner

Yes… I guess one category we could still add is one for cases where mergiraf returns conflicts which are identical to the line-based merge output. This would let us distinguish cases where mergiraf really did something.

ada4a commented

2025-07-21 23:41:44 +02:00

Owner

That does make sense I think. But now I'm wondering – how do you recognize the case where Mergiraf plain falls back to line-base merging (because of some side not parsing)? By analyzing the logs probably? Because this case and the case you mentioned could theoretically get conflated

wetneb commented

2025-07-21 23:47:35 +02:00

Author

Owner

For now I just manually check with mgf_dev if all revisions parse. It's true that in that case the line-based fallback also kicks in, one would need to make the definitions of the categories clear.

For now I just manually check with `mgf_dev` if all revisions parse. It's true that in that case the line-based fallback also kicks in, one would need to make the definitions of the categories clear.

wetneb commented

2025-07-21 23:56:12 +02:00

Author

Owner

Just for fun I tried comparing the effect of #522 on benchmark results:

Language	Cases	Conflict	Exact	Format	Differ	Parse	Panic	Time (s)
`*.py`	815	546 (-4)	124 (+1)	46 (+1)	89 (+2)	7	3	0.197 (+0.010)

I'm still working on improving the rendering. Another useful piece of info would be the test cases that changed, to inspect that they went in the right direction (in particular the ones that land in Differ).

Just for fun I tried comparing the effect of #522 on benchmark results: | Language | Cases | Conflict | Exact | Format | Differ | Parse | Panic | Time (s) | | -------- | ----- | -------- | ----- | ------ | ------ | ----- | ----- | -------- | | `*.py` | 815 | 546 **(-4)** | 124 **(+1)** | 46 **(+1)** | 89 **(+2)** | 7 | 3 | 0.197 (+0.010) | I'm still working on improving the rendering. Another useful piece of info would be the test cases that changed, to inspect that they went in the right direction (in particular the ones that land in `Differ`).

👍 1

wetneb commented

2025-07-25 19:17:46 +02:00

Author

Owner

Here are some results on a big-ish dataset, for mergiraf 0.13.0.

Language	Cases	Exact	Format	Conflict	Differ	Parse	Panic	Time (s)
`*.java`	34,530	2,512 (7%)	1,083 (3%)	28,312 (82%)	2,603 (8%)	18 (0%)	2 (0%)	0.121
`*.xml`	17,491	466 (3%)	38 (0%)	16,204 (93%)	447 (3%)	328 (2%)	8 (0%)	0.395
`*.cc`	15,014	708 (5%)	113 (1%)	1,671 (11%)	369 (2%)	12,148 (81%)	5 (0%)	0.156
`*.py`	10,817	1,981 (18%)	528 (5%)	6,890 (64%)	1,373 (13%)	41 (0%)	4 (0%)	0.161
`*.json`	10,239	1,372 (13%)	334 (3%)	7,657 (75%)	813 (8%)	43 (0%)	20 (0%)	0.208
`*.php`	10,015	1,822 (18%)	488 (5%)	6,504 (65%)	1,038 (10%)	158 (2%)	5 (0%)	0.363
`*.js`	9,795	1,323 (14%)	456 (5%)	6,349 (65%)	1,197 (12%)	454 (5%)	16 (0%)	1.288
`*.h`	8,096	493 (6%)	110 (1%)	3,079 (38%)	372 (5%)	4,037 (50%)	5 (0%)	0.053
`*.md`	5,997	412 (7%)	36 (1%)	4,911 (82%)	458 (8%)	168 (3%)	12 (0%)	0.443
`*.c`	5,841	244 (4%)	56 (1%)	1,449 (25%)	217 (4%)	3,846 (66%)	29 (0%)	0.151
`*.cpp`	4,671	290 (6%)	89 (2%)	2,271 (49%)	153 (3%)	1,846 (40%)	22 (0%)	0.276
`*.ts`	4,458	908 (20%)	291 (7%)	2,474 (55%)	664 (15%)	117 (3%)	4 (0%)	0.264
`*.scala`	3,633	935 (26%)	405 (11%)	1,166 (32%)	378 (10%)	747 (21%)	2 (0%)	0.078
`*.go`	3,333	549 (16%)	156 (5%)	2,128 (64%)	489 (15%)	11 (0%)	0	0.247
`*.hpp`	2,212	30 (1%)	8 (0%)	1,624 (73%)	30 (1%)	514 (23%)	6 (0%)	0.021
`*.html`	2,157	170 (8%)	36 (2%)	1,346 (62%)	184 (9%)	420 (19%)	1 (0%)	0.438
`*.cs`	1,651	182 (11%)	178 (11%)	981 (59%)	174 (11%)	130 (8%)	6 (0%)	0.127
`*.rs`	1,383	297 (21%)	95 (7%)	720 (52%)	258 (19%)	13 (1%)	0	0.811
`*.yml`	1,256	190 (15%)	62 (5%)	828 (66%)	148 (12%)	28 (2%)	0	0.101
`*.rb`	898	129 (14%)	43 (5%)	619 (69%)	97 (11%)	10 (1%)	0	0.067
`*.kt`	622	164 (26%)	28 (5%)	320 (51%)	79 (13%)	31 (5%)	0	0.079
`*.tsx`	602	171 (28%)	92 (15%)	218 (36%)	114 (19%)	7 (1%)	0	0.119
`*.toml`	597	180 (30%)	25 (4%)	288 (48%)	104 (17%)	0	0	0.023
`*.properties`	491	34 (7%)	10 (2%)	386 (79%)	61 (12%)	0	0	0.094
`*.jsx`	391	73 (19%)	11 (3%)	181 (46%)	50 (13%)	76 (19%)	0	0.086
`*.yaml`	334	45 (13%)	8 (2%)	237 (71%)	23 (7%)	21 (6%)	0	1.103
`*.mk`	211	16 (8%)	7 (3%)	133 (63%)	15 (7%)	40 (19%)	0	0.007
`*.dart`	185	28 (15%)	21 (11%)	50 (27%)	83 (45%)	3 (2%)	0	0.039
`*.ini`	125	4 (3%)	2 (2%)	60 (48%)	3 (2%)	56 (45%)	0	0.001
`*.lua`	39	5 (13%)	0	26 (67%)	8 (21%)	0	0	0.092
`*.hs`	27	1 (4%)	1 (4%)	23 (85%)	1 (4%)	1 (4%)	0	0.210
`*.phtml`	20	6 (30%)	3 (15%)	3 (15%)	5 (25%)	3 (15%)	0	0.045
`*.sbt`	13	3 (23%)	0	9 (69%)	1 (8%)	0	0	0.022
`*.htm`	11	0	0	7 (64%)	0	4 (36%)	0	0.315
`*.ex`	7	3 (43%)	0	1 (14%)	3 (43%)	0	0	0.120
`*.exs`	7	1 (14%)	0	4 (57%)	2 (29%)	0	0	0.139
`*.hh`	4	0	0	3 (75%)	1 (25%)	0	0	0.050
`*.mjs`	2	1 (50%)	1 (50%)	0	0	0	0	0.220
`*.hxx`	2	0	0	1 (50%)	0	1 (50%)	0	1.745
Total	157,177	15,748 (10%)	4,814 (3%)	99,133 (63%)	12,015 (8%)	25,320 (16%)	147 (0%)	0.281

Here are some results on a big-ish dataset, for mergiraf 0.13.0. | Language | Cases | Exact | Format | Conflict | Differ | Parse | Panic | Time (s) | | -------- | ----- | ----- | ------ | -------- | ------ | ----- | ----- | -------- | | `*.java` | 34,530 | 2,512 (7%) | 1,083 (3%) | 28,312 (82%) | 2,603 (8%) | 18 (0%) | 2 (0%) | 0.121 | | `*.xml` | 17,491 | 466 (3%) | 38 (0%) | 16,204 (93%) | 447 (3%) | 328 (2%) | 8 (0%) | 0.395 | | `*.cc` | 15,014 | 708 (5%) | 113 (1%) | 1,671 (11%) | 369 (2%) | 12,148 (81%) | 5 (0%) | 0.156 | | `*.py` | 10,817 | 1,981 (18%) | 528 (5%) | 6,890 (64%) | 1,373 (13%) | 41 (0%) | 4 (0%) | 0.161 | | `*.json` | 10,239 | 1,372 (13%) | 334 (3%) | 7,657 (75%) | 813 (8%) | 43 (0%) | 20 (0%) | 0.208 | | `*.php` | 10,015 | 1,822 (18%) | 488 (5%) | 6,504 (65%) | 1,038 (10%) | 158 (2%) | 5 (0%) | 0.363 | | `*.js` | 9,795 | 1,323 (14%) | 456 (5%) | 6,349 (65%) | 1,197 (12%) | 454 (5%) | 16 (0%) | 1.288 | | `*.h` | 8,096 | 493 (6%) | 110 (1%) | 3,079 (38%) | 372 (5%) | 4,037 (50%) | 5 (0%) | 0.053 | | `*.md` | 5,997 | 412 (7%) | 36 (1%) | 4,911 (82%) | 458 (8%) | 168 (3%) | 12 (0%) | 0.443 | | `*.c` | 5,841 | 244 (4%) | 56 (1%) | 1,449 (25%) | 217 (4%) | 3,846 (66%) | 29 (0%) | 0.151 | | `*.cpp` | 4,671 | 290 (6%) | 89 (2%) | 2,271 (49%) | 153 (3%) | 1,846 (40%) | 22 (0%) | 0.276 | | `*.ts` | 4,458 | 908 (20%) | 291 (7%) | 2,474 (55%) | 664 (15%) | 117 (3%) | 4 (0%) | 0.264 | | `*.scala` | 3,633 | 935 (26%) | 405 (11%) | 1,166 (32%) | 378 (10%) | 747 (21%) | 2 (0%) | 0.078 | | `*.go` | 3,333 | 549 (16%) | 156 (5%) | 2,128 (64%) | 489 (15%) | 11 (0%) | 0 | 0.247 | | `*.hpp` | 2,212 | 30 (1%) | 8 (0%) | 1,624 (73%) | 30 (1%) | 514 (23%) | 6 (0%) | 0.021 | | `*.html` | 2,157 | 170 (8%) | 36 (2%) | 1,346 (62%) | 184 (9%) | 420 (19%) | 1 (0%) | 0.438 | | `*.cs` | 1,651 | 182 (11%) | 178 (11%) | 981 (59%) | 174 (11%) | 130 (8%) | 6 (0%) | 0.127 | | `*.rs` | 1,383 | 297 (21%) | 95 (7%) | 720 (52%) | 258 (19%) | 13 (1%) | 0 | 0.811 | | `*.yml` | 1,256 | 190 (15%) | 62 (5%) | 828 (66%) | 148 (12%) | 28 (2%) | 0 | 0.101 | | `*.rb` | 898 | 129 (14%) | 43 (5%) | 619 (69%) | 97 (11%) | 10 (1%) | 0 | 0.067 | | `*.kt` | 622 | 164 (26%) | 28 (5%) | 320 (51%) | 79 (13%) | 31 (5%) | 0 | 0.079 | | `*.tsx` | 602 | 171 (28%) | 92 (15%) | 218 (36%) | 114 (19%) | 7 (1%) | 0 | 0.119 | | `*.toml` | 597 | 180 (30%) | 25 (4%) | 288 (48%) | 104 (17%) | 0 | 0 | 0.023 | | `*.properties` | 491 | 34 (7%) | 10 (2%) | 386 (79%) | 61 (12%) | 0 | 0 | 0.094 | | `*.jsx` | 391 | 73 (19%) | 11 (3%) | 181 (46%) | 50 (13%) | 76 (19%) | 0 | 0.086 | | `*.yaml` | 334 | 45 (13%) | 8 (2%) | 237 (71%) | 23 (7%) | 21 (6%) | 0 | 1.103 | | `*.mk` | 211 | 16 (8%) | 7 (3%) | 133 (63%) | 15 (7%) | 40 (19%) | 0 | 0.007 | | `*.dart` | 185 | 28 (15%) | 21 (11%) | 50 (27%) | 83 (45%) | 3 (2%) | 0 | 0.039 | | `*.ini` | 125 | 4 (3%) | 2 (2%) | 60 (48%) | 3 (2%) | 56 (45%) | 0 | 0.001 | | `*.lua` | 39 | 5 (13%) | 0 | 26 (67%) | 8 (21%) | 0 | 0 | 0.092 | | `*.hs` | 27 | 1 (4%) | 1 (4%) | 23 (85%) | 1 (4%) | 1 (4%) | 0 | 0.210 | | `*.phtml` | 20 | 6 (30%) | 3 (15%) | 3 (15%) | 5 (25%) | 3 (15%) | 0 | 0.045 | | `*.sbt` | 13 | 3 (23%) | 0 | 9 (69%) | 1 (8%) | 0 | 0 | 0.022 | | `*.htm` | 11 | 0 | 0 | 7 (64%) | 0 | 4 (36%) | 0 | 0.315 | | `*.ex` | 7 | 3 (43%) | 0 | 1 (14%) | 3 (43%) | 0 | 0 | 0.120 | | `*.exs` | 7 | 1 (14%) | 0 | 4 (57%) | 2 (29%) | 0 | 0 | 0.139 | | `*.hh` | 4 | 0 | 0 | 3 (75%) | 1 (25%) | 0 | 0 | 0.050 | | `*.mjs` | 2 | 1 (50%) | 1 (50%) | 0 | 0 | 0 | 0 | 0.220 | | `*.hxx` | 2 | 0 | 0 | 1 (50%) | 0 | 1 (50%) | 0 | 1.745 | | **Total** | 157,177 | 15,748 (10%) | 4,814 (3%) | 99,133 (63%) | 12,015 (8%) | 25,320 (16%) | 147 (0%) | 0.281 |

wetneb referenced this issue

2025-07-26 17:15:35 +02:00

WIP: tests: Utilities to benchmark mergiraf on large corpora #537

wetneb referenced this issue

2025-08-01 19:01:28 +02:00

Fuzzing infrastructure for Mergiraf #450

Rows
Columns

Broader test suite for benchmarks #485