feat: Specify different separators for each children group of a commutative parent #374

Merged
wetneb merged 11 commits from wetneb/mergiraf:279-separator-per-children-group into main 2025-05-12 13:48:24 +02:00
Owner

Here's more work on getting whitespace right :)

Fixes #251. Fixes #279.

I have also opened #373 outlining a different approach to the problem. I think we need both solutions: the one proposed here and the one outlined there, as none of them will fix all cases:

  • if there aren't any examples of separators to choose from, we need informed default separators (this PR)
  • if the formatting differs from the standard style assumed by the language profile, we need to follow the surrounding code more closely (#373)
Here's more work on getting whitespace right :) Fixes #251. Fixes #279. I have also opened #373 outlining a different approach to the problem. I think we need both solutions: the one proposed here and the one outlined there, as none of them will fix all cases: * if there aren't any examples of separators to choose from, we need informed default separators (this PR) * if the formatting differs from the standard style assumed by the language profile, we need to follow the surrounding code more closely (#373)
ada4a requested changes 2025-05-10 14:12:46 +02:00
Dismissed
ada4a left a comment
Owner

It's amazing that you were able to implement this with so few changes outside lang_profile.rs:)

I have a couple of questions regarding the implementation though

It's amazing that you were able to implement this with so few changes outside `lang_profile.rs`:) I have a couple of questions regarding the implementation though
@ -235,1 +239,3 @@
.any(|group| group.node_types.is_superset(node_types))
/// If so, return the separator to use when inserting two nodes
/// in the same place.
pub(crate) fn children_can_commute(&self, node_types: &HashSet<&str>) -> Option<&'static str> {
Owner

Maybe rename this to something like commutative_child_separator? That makes the role of this method a bit clearer imo -- "if the children can commute, here's the separator, otherwise I'll return a None"

Maybe rename this to something like `commutative_child_separator`? That makes the role of this method a bit clearer imo -- "if the children can commute, here's the separator, otherwise I'll return a `None`"
ada4a marked this conversation as resolved
@ -236,0 +239,4 @@
/// If so, return the separator to use when inserting two nodes
/// in the same place.
pub(crate) fn children_can_commute(&self, node_types: &HashSet<&str>) -> Option<&'static str> {
self.children_groups
Owner

This is very confusing imo...

I first simplified this by replacing then_some().or() with just an if-else:

if self.children_groups.is_empty() {
    Some(self.separator)
} else {
    self.children_groups
        .iter()
        .find_map(|group| {
            if group.node_types.is_superset(node_types) {
                group.separator.or(Some(self.separator))
            } else {
                None
            }
        })
}

After that I thought I could extract .or(Some(self.separator)) out of find_map since it's applied to every element anyway, but then I realized that that's actually not completely true -- apparently we:

  1. check if node_types are all in the ChildrenGroup we're looking at
  2. if they are, we either use the separator ChildrenGroup specifies if it does specify one, or the default one for the whole commutative group if it doesn't

So maybe it would be nice to add a comment along the lines of what I've described above (if I understood everything correctly).

Thinking about this a bit more, why do we unconditionally return Some(self.separator) in the self.children_groups.is_empty() case? Granted, that seems to have been the case before as well, but wouldn't we need that node_types actually are all part of this commutative group? Or is that an implicit assumption that this method makes?

This is very confusing imo... I first simplified this by replacing `then_some().or()` with just an if-else: ```rs if self.children_groups.is_empty() { Some(self.separator) } else { self.children_groups .iter() .find_map(|group| { if group.node_types.is_superset(node_types) { group.separator.or(Some(self.separator)) } else { None } }) } ``` After that I thought I could extract `.or(Some(self.separator))` out of `find_map` since it's applied to every element anyway, but then I realized that that's actually not completely true -- apparently we: 1. check if `node_types` are all in the `ChildrenGroup` we're looking at 2. if they are, we either use the separator `ChildrenGroup` specifies if it does specify one, or the default one for the whole commutative group if it doesn't So maybe it would be nice to add a comment along the lines of what I've described above (if I understood everything correctly). Thinking about this a bit more, why do we unconditionally return `Some(self.separator)` in the `self.children_groups.is_empty()` case? Granted, that seems to have been the case before as well, but wouldn't we need that `node_types` actually are all part of this commutative group? Or is that an implicit assumption that this method makes?
Author
Owner

Yes your understanding is correct, I'll add some comments to confirm.

Yes your understanding is correct, I'll add some comments to confirm.
ada4a marked this conversation as resolved
@ -243,1 +259,4 @@
pub node_types: HashSet<&'static str>,
/// An optional separator specific to this children group,
/// better suited than the one from the commutative parent.
/// It must only differ from the separator of the parent up to
Owner

maybe add this as an assertion in CommutativeGroup::restricted_to?

maybe add this as an assertion in `CommutativeGroup::restricted_to`?
ada4a marked this conversation as resolved
@ -448,2 +456,3 @@
CommutativeParent::new("field_declaration_list", "{\n", "\n", "\n}\n")
.restricted_to_groups(&[&["field_declaration"], &["function_definition"]]),
.restricted_to(vec![
ChildrenGroup::new(&["field_declaration"]),
Owner

I've been wanting to turn those slices into arrays (to avoid copying later), but it was impossible back then since you can't store arrays of different sizes in the same collection (the outer slice in this case). But now that we wrap each inner slice in ChildrenGroup::new / ChildrenGroup::with_separator, we should actually be able to. And the outer slice (which has now become a vec), should be able to be turned into an array as well.

None of that should be a part of this PR of course -- I'll try to remember to look into that after we've merged this

I've been wanting to turn those slices into arrays (to avoid copying later), but it was impossible back then since you can't store arrays of different sizes in the same collection (the outer slice in this case). But now that we wrap each inner slice in `ChildrenGroup::new` / `ChildrenGroup::with_separator`, we should actually be able to. And the outer slice (which has now become a vec), should be able to be turned into an array as well. None of that should be a part of this PR of course -- I'll try to remember to look into that after we've merged this
Owner

Nope, this still is not possible -- CommutativeGroup::restricted_to_groups still operates on slices of slices, but if we were to replace ChildrenGroup methods to accept arrays, we'd need to pass slices of arrays to restricted_to_groups, which won't work (in general) since slices of differently-sized arrays are not allowed. We could of course instead introduce new, array-based alternatives to ChildrenGroup::{new,with_separator}, but that's really not worth it

Nope, this still is not possible -- `CommutativeGroup::restricted_to_groups` still operates on slices of slices, but if we were to replace `ChildrenGroup` methods to accept arrays, we'd need to pass slices of arrays to `restricted_to_groups`, which won't work (in general) since slices of differently-sized arrays are not allowed. We could of course instead introduce new, array-based alternatives to `ChildrenGroup::{new,with_separator}`, but that's really not worth it
ada4a marked this conversation as resolved
@ -681,2 +681,3 @@
return Err("The children are not allowed to commute".to_string());
}
};
let trimmed_sep = raw_separator.trim();
Owner

Wouldn't you need to use this trimmed_sep above, when constructing the leader vecs? Or is it not relevant there for some reason?

Wouldn't you need to use this `trimmed_sep` above, when constructing the leader vecs? Or is it not relevant there for some reason?
Author
Owner

This line is superfluous indeed since we assume that the trimmed versions of both separators are the same. I'll remove it.

This line is superfluous indeed since we assume that the trimmed versions of both separators are the same. I'll remove it.
ada4a marked this conversation as resolved
Integrate review feedback
All checks were successful
/ test (pull_request) Successful in 59s
6a6bbfde0a
@ -228,1 +229,4 @@
/// Restrict a commutative parent to some children groups, possibly with their own separators
pub(crate) fn restricted_to(self, children_groups: Vec<ChildrenGroup>) -> Self {
#[cfg(debug_assertions)]
Owner

just fyi: this could also be achieved by replacing the assert! below with debug_assert! -- the then empty loop will definitely get optimized away. But this works just as well of course

just fyi: this could also be achieved by replacing the `assert!` below with `debug_assert!` -- the then empty loop will definitely get optimized away. But this works just as well of course
Author
Owner

Ah ok, I wasn't sure… I guess it also makes it clear to the reader that the whole check is done only in debug mode.

Ah ok, I wasn't sure… I guess it also makes it clear to the reader that the whole check is done only in debug mode.
Owner

I guess it also makes it clear to the reader that the whole check is done only in debug mode.

Yeah, also wanted to mention that. Since the two approaches are otherwise identical, let's go with the current, more readable one

> I guess it also makes it clear to the reader that the whole check is done only in debug mode. Yeah, also wanted to mention that. Since the two approaches are otherwise identical, let's go with the current, more readable one
ada4a marked this conversation as resolved
@ -229,0 +233,4 @@
{
for children_group in &children_groups {
if let Some(specific_separator) = children_group.separator {
assert!(
Owner

consider using assert_eq! instead?

consider using `assert_eq!` instead?
ada4a marked this conversation as resolved
wetneb force-pushed 279-separator-per-children-group from 3b629eaa95 to fb27ef61a6 2025-05-11 13:46:48 +02:00 Compare
@ -678,3 +678,3 @@
.map(Leader::grammar_name)
.collect();
if !commutative_parent.children_can_commute(&child_types) {
let Some(raw_separator) = commutative_parent.child_separator(&child_types) else {
Owner

I kind of have the same question I had earlier regarding trimmed_sep, but with raw_separator now -- why is it okay that above, we get it simply as commutative_parent.separator, but then go on to the lengths of calling child_separator? Well that's probably because we don't actually use the first raw_separator anywhere -- instead we just trim it right away.

I see that there are a couple more places around the codebase where we still access commutative_parent.separator directly, but all of them, again, trim it before using. So maybe it'd be a good idea to hide the separator field (so remove its pub)? That way, we'd either need to provide the children to get the separator that they want, or be content with the trimmed separator, since that one will be the same for the parent and the children. The second option could be realized using a new CommutativeParent::trimmed_separator() method

I kind of have the same question I had earlier regarding `trimmed_sep`, but with `raw_separator` now -- why is it okay that above, we get it simply as `commutative_parent.separator`, but then go on to the lengths of calling `child_separator`? Well that's probably because we don't actually use the first `raw_separator` anywhere -- instead we just `trim` it right away. I see that there are a couple more places around the codebase where we still access `commutative_parent.separator` directly, but all of them, again, `trim` it before using. So maybe it'd be a good idea to hide the `separator` field (so remove its `pub`)? That way, we'd either need to provide the children to get the separator that _they_ want, or be content with the trimmed separator, since that one will be the same for the parent and the children. The second option could be realized using a new `CommutativeParent::trimmed_separator()` method
Author
Owner

Sounds like an excellent idea, I'll do that.

Sounds like an excellent idea, I'll do that.
ada4a marked this conversation as resolved
@ -229,0 +237,4 @@
assert_eq!(
specific_separator.trim(),
self.separator.trim(),
"Children group separator '{}' inconsistent with parent separator '{}' in commutative parent '{}'",
Owner

Formatting a string with {:?} uses the Debug representation, which, among others, escapes the special characters, and enclosed the whole string in double quotes. So I think that could replace both the escape_separator helper and the single quotes in the format string?

Formatting a string with `{:?}` uses the Debug representation, which, among others, escapes the special characters, and enclosed the whole string in double quotes. So I think that could replace both the `escape_separator` helper and the single quotes in the format string?
Author
Owner

Fantastic, that's much better of course!

Fantastic, that's much better of course!
ada4a marked this conversation as resolved
Use :? in debug assertion
All checks were successful
/ test (pull_request) Successful in 55s
39d53b6d2f
ada4a approved these changes 2025-05-12 13:17:46 +02:00
wetneb merged commit 9343f20bcf into main 2025-05-12 13:48:24 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#374
No description provided.