Exclude commutatively merging of comments by default #467

Closed
opened 2025-07-03 14:53:04 +02:00 by wetneb · 2 comments
Owner

Follow-up to #459.

At the moment, all commutative parent declarations ought to be restricted to the children types they are meant to contain, so that commutative merging of comments is excluded. This is because the positioning of comments is something that carries meaning and the AST structure doesn't encode that meaning.

But it's annoying to have to list explicitly all possible children types of a node. If a new feature is added in the grammar later on, we will probably not notice it and forget to add it to the list.

An 'extra' node is a node type declared in the definition of the tree-sitter grammar, that can appear anywhere: https://tree-sitter.github.io/tree-sitter/creating-parsers/2-the-grammar-dsl.html
A solution would be to refuse commutatively merging any 'extra' node, but at the moment it's not possible to detect if a node is an 'extra' or not.

But I am thinking of another solution: it looks like all possible children of a given node type can be looked up programmatically from the NODE_TYPES constant exposed by grammars (which does not include the extras). We could restrict commutativity to those nodes listed there. It's non-trivial though, because it requires expanding the declared set of possible children via the subtypes.

I've opened a feature request in tree-sitter to expose the extras list through bindings:
https://github.com/tree-sitter/tree-sitter/issues/4556

Follow-up to #459. At the moment, all commutative parent declarations ought to be restricted to the children types they are meant to contain, so that commutative merging of comments is excluded. This is because the positioning of comments is something that carries meaning and the AST structure doesn't encode that meaning. But it's annoying to have to list explicitly all possible children types of a node. If a new feature is added in the grammar later on, we will probably not notice it and forget to add it to the list. An 'extra' node is a node type declared in the definition of the tree-sitter grammar, that can appear anywhere: https://tree-sitter.github.io/tree-sitter/creating-parsers/2-the-grammar-dsl.html A solution would be to refuse commutatively merging any 'extra' node, but at the moment it's not possible to detect if a node is an 'extra' or not. But I am thinking of another solution: it looks like all possible children of a given node type can be looked up programmatically from the `NODE_TYPES` constant exposed by grammars (which does not include the extras). We could restrict commutativity to those nodes listed there. It's non-trivial though, because it requires expanding the declared set of possible children via the subtypes. I've opened a feature request in tree-sitter to expose the `extras` list through bindings: https://github.com/tree-sitter/tree-sitter/issues/4556
wetneb changed title from Exclude commutatively merging of comments by defauft to Exclude commutatively merging of comments by default 2025-07-03 17:21:51 +02:00
wetneb self-assigned this 2025-08-02 18:10:36 +02:00
Author
Owner

I'm not sure how I missed that, it's totally possible, there is a node.is_extra() method that exists since 2019…

I'm not sure how I missed that, it's totally possible, there is a `node.is_extra()` method that exists since 2019…
Author
Owner

PR #562 implements this based on whether a concrete node is extra.

As mentioned above, in the future, we might be able to parse the node-types.json file that is available from the crate of the parser and extract the node types marked as extra from there. Ideally we'd do this at compilation time so that we don't embed big JSON blobs in our binary, just for them to be parsed at runtime. For this to work, we'll also need our tree-sitter parsers to be generated by a recent enough version of tree-sitter, because there was a bug in the generation of this information.

PR #562 implements this based on whether a concrete node is extra. As mentioned above, in the future, we might be able to parse the `node-types.json` file that is available from the crate of the parser and extract the node types marked as `extra` from there. Ideally we'd do this at compilation time so that we don't embed big JSON blobs in our binary, just for them to be parsed at runtime. For this to work, we'll also need our tree-sitter parsers to be generated by a recent enough version of tree-sitter, because [there was a bug in the generation of this information](https://github.com/tree-sitter/tree-sitter/pull/4557).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#467
No description provided.