feat(TypeScript): commutative merging for union and intersection types #531

Merged
ada4a merged 7 commits from wetneb/flatten_binary_operators into main 2025-07-26 14:55:23 +02:00 AGit
Owner

Closes #484.

This is the approach 2. that I mentioned there. I would have preferred approach 1., but it looks very hard to change the grammar accordingly. Arguably, not having to change the grammar at all is also a good thing. The added complexity to the parsing code looks manageable to me. What do you think?

I don't know if it's feature creep? It doesn't feel that far from mergiraf's core feature set to me.

Closes #484. This is the approach 2. that I mentioned there. I would have preferred approach 1., but it looks very hard to change the grammar accordingly. Arguably, not having to change the grammar at all is also a good thing. The added complexity to the parsing code looks manageable to me. What do you think? I don't know if it's feature creep? It doesn't feel that far from mergiraf's core feature set to me.
perf(AstNode): only flatten node if it has corresponding children
All checks were successful
/ test (pull_request) Successful in 40s
cd1e6dcc58
Author
Owner

Performance-wise, I've added a guard to avoid looking up all node types in the newly introduced flattened_nodes vector, which should also avoid reallocations when the vector gets filled. (The capacity computation could be made even tighter, by subtracting the number of children being expanded, but I'm not sure it's worth tracking that.)

Performance-wise, I've added a guard to avoid looking up all node types in the newly introduced `flattened_nodes` vector, which should also avoid reallocations when the vector gets filled. (The capacity computation could be made even tighter, by subtracting the number of children being expanded, but I'm not sure it's worth tracking that.)
Owner

I'm admittedly not very confident with grammars, but I also wasn't able to come up with a good in-grammar solution -- after looked at my initial attempt at #484 a bit more, I realized that I somehow completely missed the fact that types are nested and would need further flattening. So it probably is better to let the grammar create a correct nesting, and do just the flattening of the result ourselves

I'm admittedly not very confident with grammars, but I also wasn't able to come up with a good in-grammar solution -- after looked at my initial attempt at #484 a bit more, I realized that I somehow completely missed the fact that `type`s are nested and would need further flattening. So it probably is better to let the grammar create a correct nesting, and do just the flattening of the result ourselves
ada4a requested changes 2025-07-24 15:44:27 +02:00
Dismissed
ada4a left a comment
Owner

some small things to get out of the way

some small things to get out of the way
@ -347,6 +351,22 @@ impl<'a> AstNode<'a> {
let grammar_name = node.grammar_name();
// check if this nodes needs flattening.
Owner

s/nodes/node

s/nodes/node
ada4a marked this conversation as resolved
@ -350,0 +357,4 @@
{
let mut flattened_children =
Vec::with_capacity(children.len() + children_with_identical_grammar_type);
for child in children.into_iter() {
Owner

isn't into_iter unnecessary here? I wonder why Clippy doesn't flag it

isn't `into_iter` unnecessary here? I wonder why Clippy doesn't flag it
Author
Owner

Intuitively it makes sense to consume the existing children here, no? We want this variable to get dropped.
If I replace it by &children or children.iter() then flattened_children becomes a Vec<&&AstNode> instead of Vec<&AstNode>.

Intuitively it makes sense to consume the existing children here, no? We want this variable to get dropped. If I replace it by `&children` or `children.iter()` then `flattened_children` becomes a `Vec<&&AstNode>` instead of `Vec<&AstNode>`.
Owner

No, I meant just children – that should iterate by value, just like .into_iter() would

No, I meant just `children` – that should iterate by value, just like `.into_iter()` would
ada4a marked this conversation as resolved
@ -350,0 +359,4 @@
Vec::with_capacity(children.len() + children_with_identical_grammar_type);
for child in children.into_iter() {
if child.grammar_name == grammar_name {
flattened_children.extend(child.children.iter());
Owner

I think this should work?

- flattened_children.extend(child.children.iter())
+ flattened_children.extend(&child.children)
I think this should work? ```diff - flattened_children.extend(child.children.iter()) + flattened_children.extend(&child.children) ```
ada4a marked this conversation as resolved
@ -220,6 +220,7 @@ impl<'a> AstNode<'a> {
let mut children = Vec::new();
let mut field_to_children: FxHashMap<&'a str, Vec<&'a Self>> = FxHashMap::default();
let mut last_child_end = node.byte_range().start;
let mut children_with_identical_grammar_type = 0;
Owner

the name is not quite correct.. what we're counting is the number of children of children with an identical grammar type, but that's of course a mouthful, and I can't think of a simpler name..

the name is not quite correct.. what we're counting is the number of _children_ of children with an identical grammar type, but that's of course a mouthful, and I can't think of a simpler name..
ada4a marked this conversation as resolved
@ -253,2 +254,4 @@
field_to_children.entry(field_name).or_default().push(child);
}
if cursor.node().grammar_id() == node.grammar_id() {
children_with_identical_grammar_type += child.children.len();
Owner

The capacity computation could be made even tighter, by subtracting the number of children being expanded, but I'm not sure it's worth tracking that.

Couldn't we do child.children.len().saturating_sub(1) here to account for that? I think that a small comment could manage to explain this just fine.

Though I guess len will never be 0 -- since the grammar type is of a foldable type, it kind of by definition has non-zero children. So we could replace saturating_sub with just a - -- it would panic in debug mode, working as an assertion, but silently wrap in release mode, which is the expected behaviour of arithmetic it seems (though I don't really understand why)

> The capacity computation could be made even tighter, by subtracting the number of children being expanded, but I'm not sure it's worth tracking that. Couldn't we do `child.children.len().saturating_sub(1)` here to account for that? I think that a small comment could manage to explain this just fine. Though I guess `len` will never be 0 -- since the grammar type is of a foldable type, it kind of by definition has non-zero children. So we could replace `saturating_sub` with just a `-` -- it would panic in debug mode, working as an assertion, but silently wrap in release mode, which is the expected behaviour of arithmetic it seems (though I don't really understand why)
Author
Owner

Makes a lot of sense! It's going to be even harder to find a fitting name for this variable, but I guess comments can do…

Makes a lot of sense! It's going to be even harder to find a fitting name for this variable, but I guess comments can do…
Owner

Maybe children_added_by_flattening or something..

Maybe `children_added_by_flattening` or something..
ada4a marked this conversation as resolved
Owner

The added complexity to the parsing code looks manageable to me. What do you think?

AstNode::internal_new is already a death by thousand cuts... I wish we could create separate-out the different processing steps, but it seems difficult given that we only really create the object at the very end of the method. This flattening in particular looks like something that could be extracted into a method somewhat easily -- it only accesses children and the children-with-identical-whatever, no?

> The added complexity to the parsing code looks manageable to me. What do you think? `AstNode::internal_new` is already a death by thousand cuts... I wish we could create separate-out the different processing steps, but it seems difficult given that we only really create the object at the very end of the method. This flattening in particular looks like something that could be extracted into a method somewhat easily -- it only accesses `children` and the children-with-identical-whatever, no?
Pull out flattening logic to separate method
All checks were successful
/ test (pull_request) Successful in 40s
03eb963c5d
@ -30,2 +30,4 @@
/// See https://tree-sitter.github.io/tree-sitter/3-syntax-highlighting.html#language-injection
pub injections: Option<&'static str>,
/// List of node types that should be flattened
pub flattened_nodes: Vec<&'static str>,
Owner

This could probably be a slice, right? I don't know how well slices would play with a configuration file, but if it does turn out to be impossible, returning to vecs would be quite straightforward

This could probably be a slice, right? I don't know how well slices would play with a configuration file, but if it does turn out to be impossible, returning to vecs would be quite straightforward
Author
Owner

Sure! But actually, I wonder if it's really necessary to introduce this as a new field. I wonder if there is any reason not to flatten all commutative parents by default. I'm curious to investigate which of our existing commutative parents are allowed to appear as direct children of themselves.
Another option would be to have this as a boolean field on the commutative parent definition, indicating whether it should be flattened or not.

I expect this flattening feature will be used mostly (if not only) to declare commutative parents, so that would likely make language profiles a bit tidier…

Sure! But actually, I wonder if it's really necessary to introduce this as a new field. I wonder if there is any reason not to flatten all commutative parents by default. I'm curious to investigate which of our existing commutative parents are allowed to appear as direct children of themselves. Another option would be to have this as a boolean field on the commutative parent definition, indicating whether it should be flattened or not. I expect this flattening feature will be used mostly (if not only) to declare commutative parents, so that would likely make language profiles a bit tidier…
Owner

I wonder if there is any reason not to flatten all commutative parents by default

Hm, wouldn't that be problematic for nested lists, for example? Surely we don't want to flatten this:

list
- [
- list
    - [
    - 1
    - ,
    - 2
    - ]
- ,
-  list
    - [
    - 3
    - ,
    - 4
    - ]
- ]

into this?

list
- [
- [
- 1
- ,
- 2
- ]
- ,
- [
- 3
- ,
- 4
- ]
- ]
> I wonder if there is any reason not to flatten all commutative parents by default Hm, wouldn't that be problematic for nested lists, for example? Surely we don't want to flatten this: ``` list - [ - list - [ - 1 - , - 2 - ] - , - list - [ - 3 - , - 4 - ] - ] ``` into this? ``` list - [ - [ - 1 - , - 2 - ] - , - [ - 3 - , - 4 - ] - ] ```
Author
Owner

Yes, but I find it weird I can't think of any example in our existing commutative parent in our existing language profiles…
In Python there is a syntax for sets, but we haven't added it as a commutative parent somehow. It would be a good concrete example of your case (apart from the fact that sets aren't hashable so t = { 1, { 2, 3 } } raises an exception)… I find that funny.
But you're right, even if we don't have concrete examples, it wouldn't be very future-proof to just mark all commutative parents as associative.

Yes, but I find it weird I can't think of any example in our existing commutative parent in our existing language profiles… In Python there is a syntax for sets, but we haven't added it as a commutative parent somehow. It would be a good concrete example of your case (apart from the fact that sets aren't hashable so `t = { 1, { 2, 3 } }` raises an exception)… I find that funny. But you're right, even if we don't have concrete examples, it wouldn't be very future-proof to just mark all commutative parents as associative.
Owner

One non-commutative type that could be flattenable (but not really associative) are method calls I think?

One non-commutative type that could be flattenable (but not really associative) are method calls I think?
@ -426,0 +446,4 @@
Vec::with_capacity(children.len() + children_added_by_flattening);
for child in children {
if child.grammar_name == grammar_name {
flattened_children.extend(&child.children);
Owner

Actually, couldn't this even be just child.children? Since we're consuming child anyway, we might as well move the fileld out of it

Actually, couldn't this even be just `child.children`? Since we're consuming `child` anyway, we might as well move the fileld out of it
Author
Owner

I'd find it quite impressive if it worked, but the borrow checker doesn't seem to appreciate the idea:

cannot move out of `child.children` which is behind a shared reference
move occurs because `child.children` has type `Vec<&AstNode<'_>>`, which does not implement the `Copy` trait
I'd find it quite impressive if it worked, but the borrow checker doesn't seem to appreciate the idea: ``` cannot move out of `child.children` which is behind a shared reference move occurs because `child.children` has type `Vec<&AstNode<'_>>`, which does not implement the `Copy` trait ```
Owner

ah, right, child is still &AstNode, so you can't move (a field) out of that

ah, right, `child` is still `&AstNode`, so you can't move (a field) out of that
ada4a marked this conversation as resolved
Switch from vector to slice
All checks were successful
/ test (pull_request) Successful in 40s
7ee2a3b33b
ada4a approved these changes 2025-07-25 19:19:48 +02:00
ada4a left a comment
Owner

looking good now!

looking good now!
ada4a merged commit f5cedc4499 into main 2025-07-26 14:55:23 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#531
No description provided.