chore: Switch to tree-sitter-java-orchard #546

Merged
ada4a merged 2 commits from 540-upgrade-java-parser into main 2025-08-01 08:04:01 +02:00
Owner

Fixes a bunch of parsing failures and introduces fields for #540.

Fixes a bunch of parsing failures and introduces fields for #540.
chore: Switch to tree-sitter-java-orchard
All checks were successful
/ test (pull_request) Successful in 1m42s
b4726d7d80
Fixes a bunch of parsing failures and introduces fields for #540.
@ -1,3 +1,3 @@
class Cls {
final protected static void method() {}
Author
Owner

This is linked to #545. I plan to revisit this later.

This is linked to https://codeberg.org/mergiraf/mergiraf/issues/545. I plan to revisit this later.
@ -135,4 +120,0 @@
"non-sealed",
],
" ",
),
Author
Owner

Nice simplification due to https://github.com/tree-sitter/tree-sitter-java/pull/211 which I integrated in the fork.

Nice simplification due to https://github.com/tree-sitter/tree-sitter-java/pull/211 which I integrated in the fork.
ada4a marked this conversation as resolved
@ -549,2 +549,2 @@
assert_eq!(matching.recovery.len(), 18);
assert_eq!(matching.full.len(), 39);
assert_eq!(matching.recovery.len(), 20);
assert_eq!(matching.full.len(), 41);
Author
Owner
This is a side-effect of https://github.com/tree-sitter/tree-sitter-java/pull/211.
ada4a marked this conversation as resolved
Author
Owner

Interestingly, this seems to have caused a significant slowdown… I'm not sure if it's because my computer was more busy when I ran the second benchmark, of if it's a real issue.
And it's a bit disappointing that it's only solving one parse error. I'll investigate at some point.

Language Cases Exact Format Conflict Differ Parse Panic Time (s)
*.java 34,530 +4 (+0.0%) -4 (-0.0%) -3 (-0.0%) +3 (+0.0%) -1 (-0.0%) +1 (+0.0%) +0.015 (+17.9%)
↓ Before \ After → Exact Format Conflict Differ Parse Panic
Exact 2,516 1
Format 1,087 2 2
Conflict 3 28,272 6 1
Differ 2 3 2,615
Parse 1 17
Panic 2
Interestingly, this seems to have caused a significant slowdown… I'm not sure if it's because my computer was more busy when I ran the second benchmark, of if it's a real issue. And it's a bit disappointing that it's only solving one parse error. I'll investigate at some point. | Language | Cases | Exact | Format | Conflict | Differ | Parse | Panic | Time (s) | | -------- | ----- | ----- | ------ | -------- | ------ | ----- | ----- | -------- | | `*.java` | 34,530 | +4 **(+0.0%)** | -4 **(-0.0%)** | -3 **(-0.0%)** | +3 **(+0.0%)** | -1 **(-0.0%)** | +1 **(+0.0%)** | +0.015 (+17.9%) | | ↓ Before \ After → | Exact | Format | Conflict | Differ | Parse | Panic | | ------------------ | ----- | ------ | -------- | ------ | ----- | ----- | | Exact | 2,516 | | **1** | | | | | Format | | 1,087 | **2** | **2** | | | | Conflict | **3** | | 28,272 | **6** | | **1** | | Differ | **2** | | **3** | 2,615 | | | | Parse | | | **1** | | 17 | | | Panic | | | | | | 2 |
Author
Owner

So, I have some good news: the timing difference is spurious, due to me suspending my computer during the benchmark and it inflating the running time of one example. The updated table is:

Language Cases Exact Format Conflict Differ Parse Panic Time (s)
*.java 34,529 +4 (+0.0%) -4 (-0.0%) -3 (-0.0%) +3 (+0.0%) -1 (-0.0%) +1 (+0.0%) 0.082 (+0)

Then, another piece of "good" news, is that the case that changed from "Conflict" to "Panic" with this change is a new type of panic we didn't know before!

thread '<unnamed>' panicked at src/tree_builder.rs:281:22:
internal error: entered unreachable code: unexpected conflict size: more than two diverging sides!

It probably doesn't have much to do with the grammar update on its own and probably happens in other circumstances, so I'm excited that the benchmark picked it up. And I get to use the minimizer again, yay!

So, I have some good news: the timing difference is spurious, due to me suspending my computer during the benchmark and it inflating the running time of one example. The updated table is: | Language | Cases | Exact | Format | Conflict | Differ | Parse | Panic | Time (s) | | -------- | ----- | ----- | ------ | -------- | ------ | ----- | ----- | -------- | | `*.java` | 34,529 | +4 **(+0.0%)** | -4 **(-0.0%)** | -3 **(-0.0%)** | +3 **(+0.0%)** | -1 **(-0.0%)** | +1 **(+0.0%)** | 0.082 (+0) | Then, another piece of "good" news, is that the case that changed from "Conflict" to "Panic" with this change is a new type of panic we didn't know before! ``` thread '<unnamed>' panicked at src/tree_builder.rs:281:22: internal error: entered unreachable code: unexpected conflict size: more than two diverging sides! ``` It probably doesn't have much to do with the grammar update on its own and probably happens in other circumstances, so I'm excited that the benchmark picked it up. And I get to use the minimizer again, yay!
Add signature to annotations
All checks were successful
/ test (pull_request) Successful in 37s
9bb6ae70f9
wetneb force-pushed 540-upgrade-java-parser from 9bb6ae70f9 to a9b415fee6 2025-07-31 19:08:57 +02:00 Compare
Author
Owner

I have run a new benchmark after rebasing on top of main and adding the signature for annotation_marker nodes. The picture is quite different!

Language Cases Exact Format Conflict Differ Parse Panic Time (s)
*.java 34,530 +6 (+0.0%) +4 (+0.0%) +0 -10 (-0.0%) +0 +0 0.080 (+0)
↓ Before \ After → Exact Format Conflict Differ Parse Panic
Exact 2,521
Format 1,088
Conflict 28,282
Differ 6 4 2,612
Parse 14
Panic 3

The good news is that the change is now entirely positive (only numbers below the diagonal), with mergiraf only solving more conflicts in ways that are consistent with the observed merges.
The panic noticed above is still happening, both before and after this PR. So I think we can merge this and handle the panic separately (I'm on it).

I have run a new benchmark after rebasing on top of main and adding the signature for `annotation_marker` nodes. The picture is quite different! | Language | Cases | Exact | Format | Conflict | Differ | Parse | Panic | Time (s) | | -------- | ----- | ----- | ------ | -------- | ------ | ----- | ----- | -------- | | `*.java` | 34,530 | +6 **(+0.0%)** | +4 **(+0.0%)** | +0 | -10 **(-0.0%)** | +0 | +0 | 0.080 (+0) | | ↓ Before \ After → | Exact | Format | Conflict | Differ | Parse | Panic | | ------------------ | ----- | ------ | -------- | ------ | ----- | ----- | | Exact | 2,521 | | | | | | | Format | | 1,088 | | | | | | Conflict | | | 28,282 | | | | | Differ | **6** | **4** | | 2,612 | | | | Parse | | | | | 14 | | | Panic | | | | | | 3 | The good news is that the change is now entirely positive (only numbers below the diagonal), with mergiraf only solving more conflicts in ways that are consistent with the observed merges. The panic noticed above is still happening, both before and after this PR. So I think we can merge this and handle the panic separately (I'm on it).
Owner

Oh, very nice!

Oh, very nice!
ada4a approved these changes 2025-08-01 08:03:38 +02:00
ada4a merged commit dabfbc475f into main 2025-08-01 08:04:01 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#546
No description provided.