feat(TypeScript): commutative merging for union and intersection types #531

wetneb · 2025-07-24T15:25:11+02:00

wetneb commented

2025-07-24 15:25:11 +02:00

Closes #484.

This is the approach 2. that I mentioned there. I would have preferred approach 1., but it looks very hard to change the grammar accordingly. Arguably, not having to change the grammar at all is also a good thing. The added complexity to the parsing code looks manageable to me. What do you think?

I don't know if it's feature creep? It doesn't feel that far from mergiraf's core feature set to me.

Closes #484. This is the approach 2. that I mentioned there. I would have preferred approach 1., but it looks very hard to change the grammar accordingly. Arguably, not having to change the grammar at all is also a good thing. The added complexity to the parsing code looks manageable to me. What do you think? I don't know if it's feature creep? It doesn't feel that far from mergiraf's core feature set to me.

wetneb added 3 commits

2025-07-24 15:25:12 +02:00

refactor(LangProfile): Introduce flattened_nodes field 0f41c40dd8

feat(AstNode): Add support for node flattening 305756136f

feat(TypeScript): commutative merging for union and intersection types

/ test (pull_request) Successful in 39s

Details

c1350dbaa9

wetneb added 1 commit

2025-07-24 15:38:55 +02:00

perf(AstNode): only flatten node if it has corresponding children

/ test (pull_request) Successful in 40s

Details

cd1e6dcc58

wetneb commented

2025-07-24 15:41:39 +02:00

Performance-wise, I've added a guard to avoid looking up all node types in the newly introduced flattened_nodes vector, which should also avoid reallocations when the vector gets filled. (The capacity computation could be made even tighter, by subtracting the number of children being expanded, but I'm not sure it's worth tracking that.)

Performance-wise, I've added a guard to avoid looking up all node types in the newly introduced `flattened_nodes` vector, which should also avoid reallocations when the vector gets filled. (The capacity computation could be made even tighter, by subtracting the number of children being expanded, but I'm not sure it's worth tracking that.)

ada4a commented

2025-07-24 15:43:36 +02:00

I'm admittedly not very confident with grammars, but I also wasn't able to come up with a good in-grammar solution -- after looked at my initial attempt at #484 a bit more, I realized that I somehow completely missed the fact that types are nested and would need further flattening. So it probably is better to let the grammar create a correct nesting, and do just the flattening of the result ourselves

I'm admittedly not very confident with grammars, but I also wasn't able to come up with a good in-grammar solution -- after looked at my initial attempt at #484 a bit more, I realized that I somehow completely missed the fact that `type`s are nested and would need further flattening. So it probably is better to let the grammar create a correct nesting, and do just the flattening of the result ourselves

ada4a requested changes

2025-07-24 15:44:27 +02:00

Dismissed

ada4a left a comment

some small things to get out of the way

src/ast.rs

					
				@ -347,6 +351,22 @@ impl<'a> AstNode<'a> {

				        let grammar_name = node.grammar_name();

				        // check if this nodes needs flattening.

ada4a commented

2025-07-24 15:41:44 +02:00

s/nodes/node

ada4a marked this conversation as resolved

src/ast.rs

					
				@ -350,0 +357,4 @@

				        {

				            let mut flattened_children =

				                Vec::with_capacity(children.len() + children_with_identical_grammar_type);

				            for child in children.into_iter() {

ada4a commented

2025-07-24 15:40:59 +02:00

isn't into_iter unnecessary here? I wonder why Clippy doesn't flag it

isn't `into_iter` unnecessary here? I wonder why Clippy doesn't flag it

wetneb commented

2025-07-24 16:44:31 +02:00

Intuitively it makes sense to consume the existing children here, no? We want this variable to get dropped.
If I replace it by &children or children.iter() then flattened_children becomes a Vec<&&AstNode> instead of Vec<&AstNode>.

Intuitively it makes sense to consume the existing children here, no? We want this variable to get dropped. If I replace it by `&children` or `children.iter()` then `flattened_children` becomes a `Vec<&&AstNode>` instead of `Vec<&AstNode>`.

ada4a commented

2025-07-24 16:46:08 +02:00

No, I meant just children – that should iterate by value, just like .into_iter() would

No, I meant just `children` – that should iterate by value, just like `.into_iter()` would

👍 1

ada4a marked this conversation as resolved

src/ast.rs

					
				@ -350,0 +359,4 @@

				                Vec::with_capacity(children.len() + children_with_identical_grammar_type);

				            for child in children.into_iter() {

				                if child.grammar_name == grammar_name {

				                    flattened_children.extend(child.children.iter());

ada4a commented

2025-07-24 15:40:44 +02:00

I think this should work?

- flattened_children.extend(child.children.iter())
+ flattened_children.extend(&child.children)

I think this should work? ```diff - flattened_children.extend(child.children.iter()) + flattened_children.extend(&child.children) ```

ada4a marked this conversation as resolved

ada4a reviewed

2025-07-24 15:50:54 +02:00

src/ast.rs

					
				@ -220,6 +220,7 @@ impl<'a> AstNode<'a> {

				        let mut children = Vec::new();

				        let mut field_to_children: FxHashMap<&'a str, Vec<&'a Self>> = FxHashMap::default();

				        let mut last_child_end = node.byte_range().start;

				        let mut children_with_identical_grammar_type = 0;

ada4a commented

2025-07-24 15:50:54 +02:00

the name is not quite correct.. what we're counting is the number of children of children with an identical grammar type, but that's of course a mouthful, and I can't think of a simpler name..

the name is not quite correct.. what we're counting is the number of _children_ of children with an identical grammar type, but that's of course a mouthful, and I can't think of a simpler name..

ada4a marked this conversation as resolved

ada4a reviewed

2025-07-24 15:57:42 +02:00

src/ast.rs

					
				@ -253,2 +254,4 @@

				                    field_to_children.entry(field_name).or_default().push(child);

				                }

				                if cursor.node().grammar_id() == node.grammar_id() {

				                    children_with_identical_grammar_type += child.children.len();

ada4a commented

2025-07-24 15:57:42 +02:00

The capacity computation could be made even tighter, by subtracting the number of children being expanded, but I'm not sure it's worth tracking that.

Couldn't we do child.children.len().saturating_sub(1) here to account for that? I think that a small comment could manage to explain this just fine.

Though I guess len will never be 0 -- since the grammar type is of a foldable type, it kind of by definition has non-zero children. So we could replace saturating_sub with just a - -- it would panic in debug mode, working as an assertion, but silently wrap in release mode, which is the expected behaviour of arithmetic it seems (though I don't really understand why)

> The capacity computation could be made even tighter, by subtracting the number of children being expanded, but I'm not sure it's worth tracking that. Couldn't we do `child.children.len().saturating_sub(1)` here to account for that? I think that a small comment could manage to explain this just fine. Though I guess `len` will never be 0 -- since the grammar type is of a foldable type, it kind of by definition has non-zero children. So we could replace `saturating_sub` with just a `-` -- it would panic in debug mode, working as an assertion, but silently wrap in release mode, which is the expected behaviour of arithmetic it seems (though I don't really understand why)

wetneb commented

2025-07-24 16:01:04 +02:00

Makes a lot of sense! It's going to be even harder to find a fitting name for this variable, but I guess comments can do…

ada4a commented

2025-07-24 16:08:18 +02:00

Maybe children_added_by_flattening or something..

Maybe `children_added_by_flattening` or something..

ada4a marked this conversation as resolved

ada4a commented

2025-07-24 16:07:27 +02:00

The added complexity to the parsing code looks manageable to me. What do you think?

AstNode::internal_new is already a death by thousand cuts... I wish we could create separate-out the different processing steps, but it seems difficult given that we only really create the object at the very end of the method. This flattening in particular looks like something that could be extracted into a method somewhat easily -- it only accesses children and the children-with-identical-whatever, no?

> The added complexity to the parsing code looks manageable to me. What do you think? `AstNode::internal_new` is already a death by thousand cuts... I wish we could create separate-out the different processing steps, but it seems difficult given that we only really create the object at the very end of the method. This flattening in particular looks like something that could be extracted into a method somewhat easily -- it only accesses `children` and the children-with-identical-whatever, no?

wetneb added 2 commits

2025-07-24 16:56:41 +02:00

Follow review feedback a0b7a6addd

Pull out flattening logic to separate method

/ test (pull_request) Successful in 40s

Details

03eb963c5d

ada4a reviewed

2025-07-24 17:13:24 +02:00

src/lang_profile.rs

					
				@ -30,2 +30,4 @@

				    /// See https://tree-sitter.github.io/tree-sitter/3-syntax-highlighting.html#language-injection

				    pub injections: Option<&'static str>,

				    /// List of node types that should be flattened

				    pub flattened_nodes: Vec<&'static str>,

ada4a commented

2025-07-24 17:11:40 +02:00

This could probably be a slice, right? I don't know how well slices would play with a configuration file, but if it does turn out to be impossible, returning to vecs would be quite straightforward

wetneb commented

2025-07-24 19:40:13 +02:00

Sure! But actually, I wonder if it's really necessary to introduce this as a new field. I wonder if there is any reason not to flatten all commutative parents by default. I'm curious to investigate which of our existing commutative parents are allowed to appear as direct children of themselves.
Another option would be to have this as a boolean field on the commutative parent definition, indicating whether it should be flattened or not.

I expect this flattening feature will be used mostly (if not only) to declare commutative parents, so that would likely make language profiles a bit tidier…

Sure! But actually, I wonder if it's really necessary to introduce this as a new field. I wonder if there is any reason not to flatten all commutative parents by default. I'm curious to investigate which of our existing commutative parents are allowed to appear as direct children of themselves. Another option would be to have this as a boolean field on the commutative parent definition, indicating whether it should be flattened or not. I expect this flattening feature will be used mostly (if not only) to declare commutative parents, so that would likely make language profiles a bit tidier…

ada4a commented

2025-07-24 19:50:58 +02:00

I wonder if there is any reason not to flatten all commutative parents by default

Hm, wouldn't that be problematic for nested lists, for example? Surely we don't want to flatten this:

list
- [
- list
    - [
    - 1
    - ,
    - 2
    - ]
- ,
-  list
    - [
    - 3
    - ,
    - 4
    - ]
- ]

into this?

list
- [
- [
- 1
- ,
- 2
- ]
- ,
- [
- 3
- ,
- 4
- ]
- ]

> I wonder if there is any reason not to flatten all commutative parents by default Hm, wouldn't that be problematic for nested lists, for example? Surely we don't want to flatten this: ``` list - [ - list - [ - 1 - , - 2 - ] - , - list - [ - 3 - , - 4 - ] - ] ``` into this? ``` list - [ - [ - 1 - , - 2 - ] - , - [ - 3 - , - 4 - ] - ] ```

wetneb commented

2025-07-24 20:13:43 +02:00

Yes, but I find it weird I can't think of any example in our existing commutative parent in our existing language profiles…
In Python there is a syntax for sets, but we haven't added it as a commutative parent somehow. It would be a good concrete example of your case (apart from the fact that sets aren't hashable so t = { 1, { 2, 3 } } raises an exception)… I find that funny.
But you're right, even if we don't have concrete examples, it wouldn't be very future-proof to just mark all commutative parents as associative.

Yes, but I find it weird I can't think of any example in our existing commutative parent in our existing language profiles… In Python there is a syntax for sets, but we haven't added it as a commutative parent somehow. It would be a good concrete example of your case (apart from the fact that sets aren't hashable so `t = { 1, { 2, 3 } }` raises an exception)… I find that funny. But you're right, even if we don't have concrete examples, it wouldn't be very future-proof to just mark all commutative parents as associative.

ada4a commented

2025-07-24 20:23:18 +02:00

One non-commutative type that could be flattenable (but not really associative) are method calls I think?

👍 1

ada4a reviewed

2025-07-24 19:58:21 +02:00

src/ast.rs

					
				@ -426,0 +446,4 @@

				            Vec::with_capacity(children.len() + children_added_by_flattening);

				        for child in children {

				            if child.grammar_name == grammar_name {

				                flattened_children.extend(&child.children);

ada4a commented

2025-07-24 19:58:21 +02:00

Actually, couldn't this even be just child.children? Since we're consuming child anyway, we might as well move the fileld out of it

Actually, couldn't this even be just `child.children`? Since we're consuming `child` anyway, we might as well move the fileld out of it

wetneb commented

2025-07-25 08:38:50 +02:00

I'd find it quite impressive if it worked, but the borrow checker doesn't seem to appreciate the idea:

cannot move out of `child.children` which is behind a shared reference
move occurs because `child.children` has type `Vec<&AstNode<'_>>`, which does not implement the `Copy` trait

I'd find it quite impressive if it worked, but the borrow checker doesn't seem to appreciate the idea: ``` cannot move out of `child.children` which is behind a shared reference move occurs because `child.children` has type `Vec<&AstNode<'_>>`, which does not implement the `Copy` trait ```

ada4a commented

2025-07-25 19:17:20 +02:00

ah, right, child is still &AstNode, so you can't move (a field) out of that

ah, right, `child` is still `&AstNode`, so you can't move (a field) out of that

ada4a marked this conversation as resolved

wetneb added 1 commit

2025-07-25 08:39:20 +02:00

Switch from vector to slice

/ test (pull_request) Successful in 40s

Details

7ee2a3b33b

ada4a approved these changes