Syntactically invalid files are attempted to be parsed twice #551
Labels
No labels
Compat/Breaking
Kind
Bad merge
Kind
Bug
Kind
Documentation
Kind
Enhancement
Kind
Feature
Kind
New language
Kind
Security
Kind
Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: mergiraf/mergiraf#551
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
In
mergiraf solve
mergiraf merge
, when one of the input files fails to parse, we still parse it twice:It would be better to do it only once.
Another optimization that could be worth doing in the case of syntactically invalid files is avoiding to translate tree-sitter's data structures to our
AstNode
, by callingroot.has_error()
.Note that this would likely prevent us from returning an error with information about where the parse error happened, which would be detrimental in the case of
mgf_dev parse
, or even in the case ofmergiraf merge -v
which reports it in the logs.@wetneb wrote in #551 (comment):
you mean
mergiraf merge
? I wasn't able to find the steps your describe inmergiraf solve
..Sorry, yes, my bad.
I struggle to come up with a solution to this. Here are some options that come to mind
Option 1
Result
s fromline_based_merge_with_duplicate_signature_detection
cascading_merge
, check whether any of those areErr
s, and if so, don't even attempt to do any structural thingsThat would have a few problems:
structured_merge
, so we'll still duplicate some workcascading_merge
Option 2
Change
structured_merge
to acceptcontents_*: Result<AstNode, String>
, so that we can pass the already parsed ASTs into thereThat would work I guess, but it would mean that we'd need to also pre-parse
contents
in all the other callsites ofstructured_merge
, which sounds a bit unfortunate?Okay, you went for the ambitious (and proper) fix of avoiding double parsing not just if the parsing fails, but in general. That makes sense.
Option 2 sounds the cleanest to me. It shouldn't be hard to still provide another function that takes revisions as strings, does the parsing, and then delegates it to the function that takes revisions as trees, no? For the other function, I think it would make sense to have it take
&AstNode
directly (and notResult<&AstNode, String>
because the check for zdiff3 could also be done externally (we don't need to do it twice either).To minimize the diff, we could even try to keep the existing
structured_merge
function with its existing signature, and introduce a new one that takesAstNode
s as arguments. But anyway,structured_merge
is only called from 4 different places at the moment, so the diff should be manageable either way.One slight subtlety is that this refactoring might pull the parsing out of the code section that's protected by our timeout mechanism.
It's not uncommon for parsers to run wild, for instance if it relies on a hand-written lexer (scanner) that falls into an infinite loop. So if it's not too hard, it could be worth maintaining this protection one way or another.
I'm pretty sure it already is outside (and also inside, but that's why we're here in the first place^^) – we already parse in
line_based_merge_with_duplicate_signature_detection
Yes, that sounds good.
Hm, not so sure about this one. It is true that
structured_merge
(and its AST-taking counterpart) won't get called that often, but duplicating the zdiff3 check at every callsite still sounds pretty fragile to me.Isn't this check only relevant for
mergiraf solve
, and not formergiraf merge
? I think this refactoring would be a great opportunity to have it only in thesolve
command (which, as far as I can tell, is only one call site ofstructured_merge
) and not in themerge
command, where it will only display false alarms.Oh, haven't thought of that – you're totally right! I'll take a look at the other callsites – maybe they don't need the check either
It turns out that the function is called three times in
solve.rs
:ParsedMerge
(inresolve_merge
)Now that I think of it, the functions used for the latter two could be refactored, by removing the actual merging from them and only leaving their respective revision extraction logic. Then, in
resolve_merge_cascading
, we could use the new functions like this:So that's one fewer place of duplication.
But for the first one, hoisting the check out of
resolve_merge
seems a bit more complicated. One could think that we solve that with the following:.or_else
chain aboveBut that has a problem of revision reconstruction actually being infallible.
It looks to me that it would actually be easier to define a function like
structured_merge_with_zdiff3_detection
which does have the check, use that insolve.rs
, andstructured_merge
elsewhere. But that wouldn't help with the re-parsing problem we started with.That sounds good to me. Some places need to call a function with zdiff3 detection, some without. Some places need to call a function taking strings, other AstNode. We can just introduce all of the combinations of those criteria as needed, that would still be just 4 small functions, no? Anyway, I'm sure you can figure out something that works well.
Unfortunately I got kind of stuck already when trying to extract the parsing out of
structured_merge
-- the lifetimes just don't add up... I'll keep trying and see what I get