fix: Check that the merged text is syntactically valid and consistent with the merged tree #368
No reviewers
Labels
No labels
Compat/Breaking
Kind
Bad merge
Kind
Bug
Kind
Documentation
Kind
Enhancement
Kind
Feature
Kind
New language
Kind
Security
Kind
Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: mergiraf/mergiraf#368
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "wetneb/mergiraf:320-v3"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes #320.
Instead of just parsing the text obtained in the final rendered merge, this goes a step further and checks that the corresponding trees are "isomorphic" to the merged tree.
The isomorphism check is rather optimistic, as it gives up on sections of the merged tree that are obtained using line-based merges. To be able to also check the validity of those, we'd need to be able to parse snippets of text with a given node type as target, which is an open feature request in tree-sitter: https://github.com/tree-sitter/tree-sitter/issues/711.
mostly stylistic nits
I like the idea of having a README, and linking to the corresponding issue:)
@ -138,0 +166,4 @@
.iter()
.map(|section| match section {
MergeSection::Merged(contents) => contents.as_ref(),
MergeSection::Conflict { left, base, right } => match revision {
misc: I feel like we have this logic of "get a particular revision from a conflict" repeated a lot of times throughout the codebase. I wonder if it would be worth it to add a struct called
Conflict
which would look something like this:and use it inside all the different enums that have
Conflict
as a variant:One argument against doing this would be that sometimes we want to do something will all sides, not just one.
I had a quick look at the rest of the codebase and there are indeed a couple of other places where we do similar things. I wouldn't be opposed to such a refactoring, even though I'm not 100% sure if it will make the code more readable / maintainable (the
Conflict(Conflict)
definition is perhaps a bit confusing). Either way, perhaps it's worth doing it as a separate follow-up PR, no?Of course. I just wanted to note that down for myself so that I don't forget
@ -278,0 +297,4 @@
ast_node.isomorphic_to(other_node)
}
MergedTree::MixedTree { node, children, .. } => {
node.grammar_name() == other_node.grammar_name && {
I have a couple of remarks for this part of code, which are somewhat hard to isolate, so here they all are:
contains_line_based_merge
, I'd say there's no reason to intermingle its calculation with that ofchildren_at_rev
-- something like this should be more clearThen one could early-return directly there.
3. You
collect
the iterator, I guess in order to be able to short-circuit on mismatched lengths. I feel like that the length-comparison would make sense if we had the two vecs from the beginning -- but here we first allocate everything just to drop directly afterwards (if the lengths do end up being different). I think it would be more efficient to just compare the iterators directlyAbsolutely! About point 3, I've been looking for a more elegant way of doing it. If we just keep them as iterators, then the iterator will need to be exhausted twice to first compare the length and then iterate again through it, no?
Exactly, that's why I'd suggest removing the length comparison entirely – it'd make sense if we had two vecs (so that we could iterate multiple times over the same vec), but since we don't, running the iterator twice would actually run all the computations in it twice, which is obviously a waste.
Ok but then we need to replace
zip
byzip_longest
fromitertools
right? And arguably in some cases it might be more efficient to just abort the isomorphism check earlier as soon as the lengths differ. But I guess our expectation is that isomorphism checks generally succeed, so maybe it's not a big concern.Ohh, that's why you wanted to check the lengths! Then yes, something like
zip_longest
is definitely needed imo. Or maybe justcollect
after all...@ -278,0 +322,4 @@
Revision::Left => left,
Revision::Right => right,
};
nodes
this could be replaced with
nodes.iter().copied().map(MergedChild::Original).collect()
, which even fits on one line!@ -278,0 +341,4 @@
) => !separator.trim().is_empty(),
MergedChild::Merged(MergedTree::MixedTree {
children, ..
}) if children.is_empty() => false,
I think it's more consistent to do
!children.is_empty()
here. One could alternatively pull the other arm's thing into the match arm, but this style feels better to me@ -112,0 +116,4 @@
lang_profile,
&arena,
&ref_arena,
)?;
I think it'd be good to give a bit of a context to this
?
? What does it mean for us to fail to parse the tree in this case?What I had in mind is
map_err
, but a comment works as well, sure! Thanks:)Ah,
map_err
does sound like a better idea, in case we rely on the downstream error later on.@ -112,0 +130,4 @@
} else {
STRUCTURED_RESOLUTION_METHOD
};
Ok(MergeResult::from_merged_text(
I feel like the logic flow would be better if
from_merged_text
would be replaced withinto_merge_result
, similar toresult_tree.to_merged_text
above. I guess you've went with this option because of the already-existingMergeResult::from_parsed_merge
, so maybe you could addMergedText::into_merge_result
that just calledfrom_merged_text
internally. Not really important thoughIf you prefer
into
tofrom
, then I'm happy to just remove thefrom
methods and replace them byinto
versions of them. Or do you see a reason to keep both?Sure, why not!
70b9f94195
to99b840cb75
@ -669,9 +669,11 @@ impl<'a> AstNode<'a> {
.then(|| format!(" {}", Color::Red.paint(self.source.replace('\n', "\\n"))))
.unwrap_or_default();
let commutative = (next_parent.is_some())
what's this all about? 😅
That's a different version of clippy wanting to rewrite this for us… :-D I'll revert it.
de98e09985
toeb7a3aa3cf