fix: Prevent the commutative merging of all extra elements #562

Merged
wetneb merged 4 commits from extra into main 2025-08-16 11:35:43 +02:00
Owner

Closes #467.

This will likely allow us to remove quite a few children groups from language profiles, as those were often added for the purpose of excluding comments from commutative merging.

Closes #467. This will likely allow us to remove quite a few children groups from language profiles, as those were often added for the purpose of excluding comments from commutative merging.
clippy
All checks were successful
/ test (pull_request) Successful in 40s
dc294dbcae
wetneb changed title from fix: Prevent the commutative of all extra elements to fix: Prevent the commutative merging of all extra elements 2025-08-06 09:48:04 +02:00
ada4a left a comment
Owner

The implementation is good already, but I think we could improve it even further

The implementation is good already, but I think we could improve it even further
@ -53,2 +53,4 @@
/// An internal node id, guaranteed to be unique within the tree.
pub id: usize,
/// Whether this node is one that can appear anywhere in the source (such as a code comment)
pub is_extra: bool,
Owner

That's one expensive boolean you've got there 😅 (64 bits of space for 1 bit of info). Not like I can come up with something better though.. one could of course make AstNode into an enum of Regular/Extra, but it would be such a massive headache to modify all the code touching AST nodes throughout the codebase to handle that change

That's one expensive boolean you've got there 😅 (64 bits of space for 1 bit of info). Not like I can come up with something better though.. one could of course make `AstNode` into an enum of `Regular`/`Extra`, but it would be _such_ a massive headache to modify all the code touching AST nodes throughout the codebase to handle that change
Owner

Anyway, maybe we want to include it in the Display output while we're at it? Not sure quite how useful it would be though

Anyway, maybe we want to include it in the `Display` output while we're at it? Not sure quite how useful it would be though
Author
Owner

I was naively hoping that Rust would be able to magically make this compact by bundling it up with another field, but that was wishful thinking… If we want to shave some bits, I guess we could probably bundle this up in the id field (for instance as least significant bit), but I am not sure it's worth it.

The ideal situation would be to be able to look up directly from theLangProfile whether a certain node type is extra or not, but that's not information that's readily available sadly.

I was naively hoping that Rust would be able to magically make this compact by bundling it up with another field, but that was wishful thinking… If we want to shave some bits, I guess we could probably bundle this up in the `id` field (for instance as least significant bit), but I am not sure it's worth it. The ideal situation would be to be able to look up directly from the`LangProfile` whether a certain node type is `extra` or not, but that's not information that's readily available sadly.
ada4a marked this conversation as resolved
@ -651,0 +654,4 @@
.any(|repr| repr.node.is_extra)
{
Err(
"One of the children is an extra node, cannot merge commutatively"
Owner

Hm, it's a bit unfortunate that this error only happens inside commutatively_merge_lists, because its description says that it should only be called if we know that the lists are safe to merge. So for example in build_subtree_from_changeset, we:

  1. check if the parent is commutative
  2. if so, call this method and ? the error
  3. otherwise, create a conflict tree

Wouldn't it be nicer if we could tell the caller that the children weren't commutative after all, so that they could still fall back onto some other strategy?

I think this is a case which could benefit from a more descriptive error type – something roughly like this:

enum CommutativelyMergeListsError {
    ChildrenNotCommutative(String),
    Other(String)
}

That way, the caller could match on the error kind and do the best thing

Hm, it's a bit unfortunate that this error only happens _inside_ `commutatively_merge_lists`, because its description says that it should only be called _if_ we know that the lists are safe to merge. So for example in `build_subtree_from_changeset`, we: 1. check if the parent is commutative 2. if so, call this method and `?` the error 3. otherwise, create a conflict tree Wouldn't it be nicer if we could tell the caller that the children weren't commutative after all, so that they could still fall back onto some other strategy? I think this is a case which could benefit from a more descriptive error type – something roughly like this: ```rs enum CommutativelyMergeListsError { ChildrenNotCommutative(String), Other(String) } ``` That way, the caller could match on the error kind and do the best thing
Author
Owner

It's a change I have thought about doing in #563 but I was reluctant because it's a change to the actual merging logic which could have unintended consequences. But I agree with you that it is worth doing.

It's a change I have thought about doing in https://codeberg.org/mergiraf/mergiraf/pulls/563 but I was reluctant because it's a change to the actual merging logic which could have unintended consequences. But I agree with you that it is worth doing.
Owner

Yeah it's probably best to leave this to a future PR -- could you please add a TODO comment?

Yeah it's probably best to leave this to a future PR -- could you please add a TODO comment?
Author
Owner

I've opened a PR for it instead: #568

I've opened a PR for it instead: https://codeberg.org/mergiraf/mergiraf/pulls/568
ada4a marked this conversation as resolved
@ -651,1 +662,4 @@
}
})
.try_collect()?;
let Some(raw_separator) = commutative_parent.child_separator(&child_types) else {
Owner

I realize that this method is currently only called in this one place, but more generally, if we simplify our commutative children groups after this change, wouldn't it need to also check that the provided list of children doesn't contain any extra nodes? In fact, maybe the check should just be moved there..

I realize that this method is currently only called in this one place, but more generally, if we simplify our commutative children groups after this change, wouldn't it need to _also_ check that the provided list of children doesn't contain any extra nodes? In fact, maybe the check should just be moved there..
Author
Owner

I would be very keen to do that if we had a simple way to determine if a given node type is extra. At the moment we can only do it for a concrete node, sadly.

In the future, we might be able to parse the node-types.json file that is available from the crate of the parser and extract the node types marked as extra from there. Ideally we'd do this at compilation time so that we don't embed big JSON blobs in our binary, just for them to be parsed at runtime. For this to work, we'll also need our tree-sitter parsers to be generated by a recent enough version of tree-sitter, because there was a bug in the generation of this information.

If we go ahead with my current solution I'd open an issue about making this move, as it would indeed be more principled and save some memory.

I would be very keen to do that if we had a simple way to determine if a given node *type* is extra. At the moment we can only do it for a concrete node, sadly. In the future, we might be able to parse the `node-types.json` file that is available from the crate of the parser and extract the node types marked as `extra` from there. Ideally we'd do this at compilation time so that we don't embed big JSON blobs in our binary, just for them to be parsed at runtime. For this to work, we'll also need our tree-sitter parsers to be generated by a recent enough version of tree-sitter, because [there was a bug in the generation of this information](https://github.com/tree-sitter/tree-sitter/pull/4557). If we go ahead with my current solution I'd open an issue about making this move, as it would indeed be more principled and save some memory.
Owner

if we had a simple way to determine if a given node type is extra

Oh, right, I completely overlooked this.

Let's go with this approach then, but maybe also add a FIXME comment to CommutativeParent::child_separator?

> if we had a simple way to determine if a given node _type_ is extra Oh, right, I completely overlooked this. Let's go with this approach then, but maybe also add a FIXME comment to `CommutativeParent::child_separator`?
ada4a marked this conversation as resolved
Add comment
All checks were successful
/ test (pull_request) Successful in 42s
138fce747f
ada4a approved these changes 2025-08-16 11:06:40 +02:00
ada4a left a comment
Owner

LGTM!

LGTM!
@ -0,0 +1,7 @@
<<<<<<< LEFT
Owner

off-topic (I think): I was wondering what this would look like in compact mode. I expected something like this:

use crate::{foo,
<<<<<<< LEFT
 /* bar */
||||||| BASE
=======
 /* baz */
>>>>>>> RIGHT
};

but apparently it's this instead:

use
<<<<<<< LEFT
 crate::{foo, /* bar */};
||||||| BASE
 crate::{foo};
=======
 crate::{foo, /* baz */};
>>>>>>> RIGHT

I wonder whether that's some sort of a bug in MergedText

off-topic (I think): I was wondering what this would look like in compact mode. I expected something like this: ``` use crate::{foo, <<<<<<< LEFT /* bar */ ||||||| BASE ======= /* baz */ >>>>>>> RIGHT }; ``` but apparently it's this instead: ``` use <<<<<<< LEFT crate::{foo, /* bar */}; ||||||| BASE crate::{foo}; ======= crate::{foo, /* baz */}; >>>>>>> RIGHT ``` I wonder whether that's some sort of a bug in `MergedText`
wetneb merged commit feb627eb46 into main 2025-08-16 11:35:43 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#562
No description provided.