fix(typescript): Bad handling of semicolons in commutative merging of class bodies #585

Merged
wetneb merged 4 commits from tsx_separators into main 2025-09-11 23:01:54 +02:00
Owner

This is a minimized version of a bug reported by Stainless.
The ; are not recognized as separators in the class_body, so the merging algorithm is treating them as independent elements that are being added.
One solution would be to fix the Typescript grammar so that those elements are bundled into the attributes they come at the end of.
One could perhaps come up with other solutions on Mergiraf's side, but it looks like we would need to have to relax the current restriction we have that each separator associated to a commutative group must be equal to the default separator for the commutative parent, up to whitespace.

This is a minimized version of a bug reported by Stainless. The `;` are not recognized as separators in the `class_body`, so the merging algorithm is treating them as independent elements that are being added. One solution would be to fix the Typescript grammar so that those elements are bundled into the attributes they come at the end of. One could perhaps come up with other solutions on Mergiraf's side, but it looks like we would need to have to relax the current restriction we have that each separator associated to a commutative group must be equal to the default separator for the commutative parent, up to whitespace.
Add failing example
Some checks failed
/ test (pull_request) Failing after 42s
a475276f12
Owner

One could perhaps come up with other solutions on Mergiraf's side

You probably mean creating a ChildrenGroup for (public)_field_definitions which has ; as its separator? That honestly sounds like the most logical solution to the problem at hand, even if it would require relaxing the separator restriction, because the latter is imo also pretty sensible, even on its own. Should I try to implement that?

I think the only problematic place is commutatively_merge_lists, where we first call keep_content_only on each side, which removes the separators and the delimiters, in order to be able to call child_separator to find what the actual separator is. I think the way we would adjust that is to not use keep_content_only at all, and instead make child_separator smarter:

  1. make it take nodes directly, not their kinds
  2. for each ChildrenGroup it considers:
    1. filter out all the nodes which match the group's separator/delimiter (we should also somehow include the CommutativeParent's default separator in this calculation)
    2. check whether all the remaining nodes' kinds are accepted by the group

This would be somewhat more computationally expensive, but I don't really see a way around it?...

> One could perhaps come up with other solutions on Mergiraf's side You probably mean creating a `ChildrenGroup` for `(public)_field_definition`s which has `;` as its separator? That honestly sounds like the most logical solution to the problem at hand, even if it would require relaxing the separator restriction, because the latter is imo also pretty sensible, even on its own. Should I try to implement that? I think the only problematic place is `commutatively_merge_lists`, where we first call `keep_content_only` on each side, which removes the separators and the delimiters, in order to be able to call `child_separator` to find what the _actual_ separator is. I think the way we would adjust that is to not use `keep_content_only` at all, and instead make `child_separator` smarter: 1. make it take nodes directly, not their kinds 2. for each `ChildrenGroup` it considers: 1. filter out all the nodes which match the group's separator/delimiter (we should also somehow include the `CommutativeParent`'s default separator in this calculation) 2. check whether all the remaining nodes' kinds are accepted by the group This would be somewhat more computationally expensive, but I don't really see a way around it?...
Owner

One question -- currently, we first map each node to a leader (in keep_contents), then map those back to all representatives, and for each of those, check whether they're extra. But isn't the "extraness" of a leader the same as of all its representatives, similarly to the grammar kind?

One question -- currently, we first map each node to a leader (in `keep_contents`), then map those back to _all_ representatives, and for each of those, check whether they're extra. But isn't the "extraness" of a leader the same as of all its representatives, similarly to the grammar kind?
Owner

So, here's what I came up with. Interestingly enough, it deviated wildly from the original code during development, but ended up returning to more or less the same thing in the end

So, here's what I came up with. Interestingly enough, it deviated wildly from the original code during development, but ended up returning to more or less the same thing in the end
wetneb changed title from WIP: fix(typescript): Bad handling of semicolons in commutative merging of class bodies to fix(typescript): Bad handling of semicolons in commutative merging of class bodies 2025-09-11 18:20:36 +02:00
Author
Owner

Really cool, thank you so much for the fix! I think it makes a lot of sense like this.

Really cool, thank you so much for the fix! I think it makes a lot of sense like this.
ada4a force-pushed tsx_separators from 2e663fb381 to ed788431dc 2025-09-11 20:24:34 +02:00 Compare
Owner

Thank you, I am quite proud of this one:)

Thank you, I am quite proud of this one:)
ada4a force-pushed tsx_separators from ed788431dc to 3a2c6f985c 2025-09-11 20:28:30 +02:00 Compare
Author
Owner

The effect on merge output seems unambiguously positive too (the timing difference shouldn't be meaningful though, the load on the machine was variable):

Language Cases Exact Format Conflict Differ Parse Panic Time (s)
*.ts 11,230 +30 (+0.3%) +9 (+0.1%) +39 (+0.3%) -78 (-0.7%) +0 +0 -0.011 (-7.2%)
↓ Before \ After → Exact Format Conflict Differ Parse Panic
Exact 2,347 1
Format 7 729 3
Conflict 2 1 6,202
Differ 22 18 38 1,672
Parse 187
Panic 1
The effect on merge output seems unambiguously positive too (the timing difference shouldn't be meaningful though, the load on the machine was variable): | Language | Cases | Exact | Format | Conflict | Differ | Parse | Panic | Time (s) | | -------- | ----- | ----- | ------ | -------- | ------ | ----- | ----- | -------- | | `*.ts` | 11,230 | +30 **(+0.3%)** | +9 **(+0.1%)** | +39 **(+0.3%)** | -78 **(-0.7%)** | +0 | +0 | -0.011 (-7.2%) | | ↓ Before \ After → | Exact | Format | Conflict | Differ | Parse | Panic | | ------------------ | ----- | ------ | -------- | ------ | ----- | ----- | | Exact | 2,347 | | **1** | | | | | Format | **7** | 729 | **3** | | | | | Conflict | **2** | **1** | 6,202 | | | | | Differ | **22** | **18** | **38** | 1,672 | | | | Parse | | | | | 187 | | | Panic | | | | | | 1 |
Owner

How nice!

How nice!
wetneb merged commit 3a1ff3cc5c into main 2025-09-11 23:01:54 +02:00
ada4a deleted branch tsx_separators 2025-09-17 13:51:05 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#585
No description provided.