WIP: fix: Check the syntactic validity of merges #321

Closed
wetneb wants to merge 3 commits from wetneb:320-check-syntactic-validity-of-merges into main
Owner

Motivated by #320 and earlier cases like #170, we introduce a check to ensure that our grammar is able to parse the rendered version of a merged file.
This check was already being done for line-based merges: we extend it to structured merges.
Sadly it's not sufficient to fix the motivating example from #320 because the Python grammar is overly forgiving. But it's probably worth having anyway.

Motivated by #320 and earlier cases like #170, we introduce a check to ensure that our grammar is able to parse the rendered version of a merged file. This check was already being done for line-based merges: we extend it to structured merges. Sadly it's not sufficient to fix the motivating example from #320 because the Python grammar is overly forgiving. But it's probably worth having anyway.
wetneb force-pushed 320-check-syntactic-validity-of-merges from 26ae5c6ee4 to 2001e07ce9 2025-04-09 00:24:33 +02:00 Compare
wetneb changed title from fix: Check the syntactic validity of merges to WIP: fix: Check the syntactic validity of merges 2025-04-09 00:33:30 +02:00
Author
Owner

Marking as WIP because I realize this will need tweaking in case the compact mode is on, because we shouldn't expect that re-parsing the revisions derived from a file with conflicts should succeed (because of the added newlines in them).

Marking as WIP because I realize this will need tweaking in case the compact mode is on, because we shouldn't expect that re-parsing the revisions derived from a file with conflicts should succeed (because of the added newlines in them).
wetneb force-pushed 320-check-syntactic-validity-of-merges from 2001e07ce9 to a70b4c0c85 2025-04-27 15:24:17 +02:00 Compare
Author
Owner

Update: since the recent changes around avoiding the re-parsing of line-based merges, this PR would need a bit more work (because it re-introduces some of that).

I'm thinking it would actually make sense to change the rendering of MergedText so that it doesn't directly output a String, but a ParsedMerge instead, which could later be rendered to a String. This means that MergeResult could just have the ParsedMerge as a field (instead of contents: String currently). Does that sound like a good idea to you, @ada4a?

Update: since the recent changes around avoiding the re-parsing of line-based merges, this PR would need a bit more work (because it re-introduces some of that). I'm thinking it would actually make sense to change the rendering of MergedText so that it doesn't directly output a String, but a ParsedMerge instead, which could later be rendered to a String. This means that MergeResult could just have the ParsedMerge as a field (instead of `contents: String` currently). Does that sound like a good idea to you, @ada4a?
Owner

I'm thinking it would actually make sense to change the rendering of MergedText so that it doesn't directly output a String, but a ParsedMerge instead, which could later be rendered to a String

This definitely sounds very appealing! Not needing to duplicate the rendering logic is especially nice. I'm not sure how hard this change would be to implement...

This means that MergeResult could just have the ParsedMerge as a field (instead of contents: String currently)

That would be lovely! We could probably avoid storing cached conflict_mass and conflict_count as well...

> I'm thinking it would actually make sense to change the rendering of MergedText so that it doesn't directly output a String, but a ParsedMerge instead, which could later be rendered to a String This definitely sounds very appealing! Not needing to duplicate the rendering logic is especially nice. I'm not sure how hard this change would be to implement... > This means that MergeResult could just have the ParsedMerge as a field (instead of contents: String currently) That would be lovely! We could probably avoid storing cached `conflict_mass` and `conflict_count` as well...
Author
Owner

After a quick investigation, I think ParsedMerge is actually not a good fit for this, because (as its name indicates) it is geared towards parsed representations of line-based merges. It comes with an indexing logic to be able to generate matchings between revisions, which doesn't play well with representing the output of a structured merge (as it would require us to generate certain byte offsets which are complicated to compute and unnecessary when representing a merge output).

The MergedText struct is much more fitting, because it lets us represent conflicts independently from line boundaries and isn't encumbered by this indexing. This should make it possible to check for syntactic validity regardless of whether the compact mode is on.

After a quick investigation, I think `ParsedMerge` is actually not a good fit for this, because (as its name indicates) it is geared towards *parsed* representations of line-based merges. It comes with an indexing logic to be able to generate matchings between revisions, which doesn't play well with representing the output of a structured merge (as it would require us to generate certain byte offsets which are complicated to compute and unnecessary when representing a merge output). The `MergedText` struct is much more fitting, because it lets us represent conflicts independently from line boundaries and isn't encumbered by this indexing. This should make it possible to check for syntactic validity regardless of whether the compact mode is on.
Owner

Completely agree – and ParsedMerge should be trivially translatable into MergedText

Completely agree – and `ParsedMerge` should be trivially translatable into `MergedText`
Author
Owner

The example in #320 is still considered valid by tree-sitter-python, so the approach followed in this PR will not make it possible to detect its incorrectness.
Instead, I think we should not just parse the revisions, but also check that the parsed trees are consistent with the merged tree they are generated from. I'm opening a new PR for this: #368.

The example in #320 is still considered valid by tree-sitter-python, so the approach followed in this PR will not make it possible to detect its incorrectness. Instead, I think we should not just parse the revisions, but also check that the parsed trees are consistent with the merged tree they are generated from. I'm opening a new PR for this: #368.
wetneb closed this pull request 2025-05-06 09:49:28 +02:00
All checks were successful
/ test (pull_request) Successful in 1m1s

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#321
No description provided.