fix: Reject parse trees with missing nodes #587
No reviewers
Labels
No labels
Compat/Breaking
Kind
Bad merge
Kind
Bug
Kind
Documentation
Kind
Enhancement
Kind
Feature
Kind
New language
Kind
Security
Kind
Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: mergiraf/mergiraf#587
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "554-missing-nodes"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes #554.
I had to adjust a bunch of test cases which cannot parse anymore.
The fact that our JSON parser doesn't accept trailing commas is a
concern, I think we should migrate to a more accepting one, such as
https://github.com/Joakker/tree-sitter-json5.
May I ask why? That is consistent with the spec, after all
I think the variants without commas at the end are widespread enough that it would be worth supporting it.
LGTM, just some questions regarding the error message
@ -329,0 +330,4 @@
let full_range = node.range();
return Err(format!(
"parse error at {}:{}, expected a missing token: {}",
I think having the node kind in backticks would help readibility, WDYT?
Also, "expecting" sounds a bit wrong here I think? If anything, we "encountered" a missing node. But given that we output the kind of the missing node, maybe phrasing it like "parse error at {pos}, a missing
{kind}
node" would work? Not really sure about this one thoughWell, if you look at it from mergiraf's internal point of view, yes, we are encountering a "MISSING token" returned by the tree-sitter parser. But from a user perspective, I would rather describe the problem as "we expect this bit of syntax at this position and it's not there". I'm trying to formulate the error message in a way that makes more sense to the user.
So maybe just
"expected
{}"
would do, the missing part is sort of redundant I would say.Oh, that does sound better! Yeah, let's go with that
@ -1241,0 +1252,4 @@
let ctx = ctx();
let lang_profile = LangProfile::detect_from_filename("test.java")
.expect("Could not load the Java lang profile");
let parse = AstNode::parse("class Test {", lang_profile, &ctx.arena, &ctx.ref_arena);
nit: I think at this point we could as well use
Arena::new()
directly here..@ -570,3 +570,3 @@
fn dont_bundle_into_delims() {
let ctx = ctx();
let source = "(/* this is a comment */)";
let source = "fn test(/* this is a comment */) {}";
nit: I think I remember this being parsed as an
expression_statement
, so adding a semicolon after the tuple would've probably sufficed? Though your fix works just as well of courseSorry, do you mean variants with commas? I honestly have no idea, but I imagine that if someone was consciouy opting into the JSON5 grammar, they would the file the
.json5
ending, or something like that? Because if they went with.json
, it'd just be a matter of time before some program with a conformant parser would reject their file?Yes sorry, I mean with trailing commas… I think in general it's good if our parsers err more on the accepting side (which is often the case for syntax-highlighting grammars), so that we are able to deal with slightly broken input.
Another reason to switch to that parser would be, for instance, to better support JSON snippets in markdown which could have comments, for documentation purposes, even though it's not valid JSON:
You can see how Forgejo itself accepts to highlight the comment even though I specified that this sample is in
json
, notjson5
.Anyway, I don't have strong feelings about it, but I wouldn't do it in this PR anyway.
I would be excited to merge this PR because it's blocking the addition of a test case I wrote for the minimizer, which would enable us to merge that one too… ^^
Of course, go on! (after the error message thing is done^^)
a55b46ce07
to9fdae959aa
Ah yes sorry about that, it looks like my attention span is getting dangerously small… To my defense, I went into a succession of nested rabbit holes:
it's rare that it gets that bad! :-D
9fdae959aa
to68845ede84
LGTM, thank you!