WIP: fix: Check the syntactic validity of merges

wetneb commented

2025-04-09 00:23:43 +02:00

Owner

Motivated by #320 and earlier cases like #170, we introduce a check to ensure that our grammar is able to parse the rendered version of a merged file.
This check was already being done for line-based merges: we extend it to structured merges.
Sadly it's not sufficient to fix the motivating example from #320 because the Python grammar is overly forgiving. But it's probably worth having anyway.

Motivated by #320 and earlier cases like #170, we introduce a check to ensure that our grammar is able to parse the rendered version of a merged file. This check was already being done for line-based merges: we extend it to structured merges. Sadly it's not sufficient to fix the motivating example from #320 because the Python grammar is overly forgiving. But it's probably worth having anyway.

wetneb force-pushed 320-check-syntactic-validity-of-merges from 26ae5c6ee4 to 2001e07ce9

2025-04-09 00:24:33 +02:00

Compare

wetneb changed title from ~~fix: Check the syntactic validity of merges~~ to WIP: fix: Check the syntactic validity of merges

2025-04-09 00:33:30 +02:00

wetneb commented

2025-04-09 00:34:31 +02:00

Author

Owner

Marking as WIP because I realize this will need tweaking in case the compact mode is on, because we shouldn't expect that re-parsing the revisions derived from a file with conflicts should succeed (because of the added newlines in them).

wetneb force-pushed 320-check-syntactic-validity-of-merges from 2001e07ce9 to a70b4c0c85

2025-04-27 15:24:17 +02:00

Compare

wetneb commented

2025-04-28 18:52:45 +02:00

Author

Owner

Update: since the recent changes around avoiding the re-parsing of line-based merges, this PR would need a bit more work (because it re-introduces some of that).

I'm thinking it would actually make sense to change the rendering of MergedText so that it doesn't directly output a String, but a ParsedMerge instead, which could later be rendered to a String. This means that MergeResult could just have the ParsedMerge as a field (instead of contents: String currently). Does that sound like a good idea to you, @ada4a?

Update: since the recent changes around avoiding the re-parsing of line-based merges, this PR would need a bit more work (because it re-introduces some of that). I'm thinking it would actually make sense to change the rendering of MergedText so that it doesn't directly output a String, but a ParsedMerge instead, which could later be rendered to a String. This means that MergeResult could just have the ParsedMerge as a field (instead of `contents: String` currently). Does that sound like a good idea to you, @ada4a?

ada4a commented

2025-04-28 19:34:46 +02:00

Owner

I'm thinking it would actually make sense to change the rendering of MergedText so that it doesn't directly output a String, but a ParsedMerge instead, which could later be rendered to a String

This definitely sounds very appealing! Not needing to duplicate the rendering logic is especially nice. I'm not sure how hard this change would be to implement...

This means that MergeResult could just have the ParsedMerge as a field (instead of contents: String currently)

That would be lovely! We could probably avoid storing cached conflict_mass and conflict_count as well...

> I'm thinking it would actually make sense to change the rendering of MergedText so that it doesn't directly output a String, but a ParsedMerge instead, which could later be rendered to a String This definitely sounds very appealing! Not needing to duplicate the rendering logic is especially nice. I'm not sure how hard this change would be to implement... > This means that MergeResult could just have the ParsedMerge as a field (instead of contents: String currently) That would be lovely! We could probably avoid storing cached `conflict_mass` and `conflict_count` as well...

wetneb commented

2025-04-30 16:47:31 +02:00

Author

Owner

After a quick investigation, I think ParsedMerge is actually not a good fit for this, because (as its name indicates) it is geared towards parsed representations of line-based merges. It comes with an indexing logic to be able to generate matchings between revisions, which doesn't play well with representing the output of a structured merge (as it would require us to generate certain byte offsets which are complicated to compute and unnecessary when representing a merge output).

The MergedText struct is much more fitting, because it lets us represent conflicts independently from line boundaries and isn't encumbered by this indexing. This should make it possible to check for syntactic validity regardless of whether the compact mode is on.

After a quick investigation, I think `ParsedMerge` is actually not a good fit for this, because (as its name indicates) it is geared towards *parsed* representations of line-based merges. It comes with an indexing logic to be able to generate matchings between revisions, which doesn't play well with representing the output of a structured merge (as it would require us to generate certain byte offsets which are complicated to compute and unnecessary when representing a merge output). The `MergedText` struct is much more fitting, because it lets us represent conflicts independently from line boundaries and isn't encumbered by this indexing. This should make it possible to check for syntactic validity regardless of whether the compact mode is on.

ada4a commented

2025-04-30 17:18:10 +02:00

Owner

Completely agree – and ParsedMerge should be trivially translatable into MergedText

Completely agree – and `ParsedMerge` should be trivially translatable into `MergedText`

wetneb commented

2025-05-06 09:49:28 +02:00

Author

Owner

The example in #320 is still considered valid by tree-sitter-python, so the approach followed in this PR will not make it possible to detect its incorrectness.
Instead, I think we should not just parse the revisions, but also check that the parsed trees are consistent with the merged tree they are generated from. I'm opening a new PR for this: #368.

The example in #320 is still considered valid by tree-sitter-python, so the approach followed in this PR will not make it possible to detect its incorrectness. Instead, I think we should not just parse the revisions, but also check that the parsed trees are consistent with the merged tree they are generated from. I'm opening a new PR for this: #368.

wetneb closed this pull request

2025-05-06 09:49:28 +02:00

/ test (pull_request) Successful in 1m1s

Details

Pull request closed

This pull request cannot be reopened because the branch was deleted.

Rows
Columns

WIP: fix: Check the syntactic validity of merges #321

Pull request closed