test(supported_langs): check that commutative parents' children all have signatures defined #507

					
				@ -167,0 +181,4 @@

				                            // `field_declaration` above -- is there a way to reuse it instead?

				                            Field("declarator"),

				                            Field("name"),

				                        ],

wetneb commented

2025-07-14 22:01:20 +02:00

I'm surprised by the three components of this signature. I would expect that it's not possible to declare two records with the same name, so vec![Field("name")] would be enough, no?

I'm surprised by the three components of this signature. I would expect that it's not possible to declare two records with the same name, so `vec![Field("name")]` would be enough, no?

ada4a commented

2025-07-14 22:06:25 +02:00

I don't really know if I'm being honest. I took this from method_declaration, which does need it because of overloading, but I guess structs/records/enums don't?

I don't really know if I'm being honest. I took this from `method_declaration`, which does need it because of overloading, but I guess structs/records/enums don't?

wetneb commented

2025-07-14 22:09:58 +02:00

Yes, not only they don't need it, but adding it will prevent the signatures to be equal if the records have different fields.

For instance, in a case like this:

public record Person (String name, String address) {}

public record Person (String firstName, String lastName, String address) {}

we do want to flag those as being duplicates, because the Java compiler will not accept that.
With the signature definition you have, those two declarations would get different signatures, so we wouldn't detect that they are in conflict.

Yes, not only they don't need it, but adding it will prevent the signatures to be equal if the records have different fields. For instance, in a case like this: ```java public record Person (String name, String address) {} public record Person (String firstName, String lastName, String address) {} ``` we do want to flag those as being duplicates, because the Java compiler will not accept that. With the signature definition you have, those two declarations would get different signatures, so we wouldn't detect that they are in conflict.

ada4a commented

2025-07-14 22:12:21 +02:00

You're totally right, I'll go fix it

ada4a marked this conversation as resolved

ada4a force-pushed comm-parents-check-have-all-sigs from 76fb1168e8 to d38fde0970

2025-07-14 22:14:07 +02:00

Compare

ada4a commented

2025-07-14 22:26:55 +02:00

In Rust (I decided to work on a language I actually know something about^^), let_declarations are marked as children of commutative parents of both declaration_list (e.g. inside an impl block) and source_file, but I don't think either of those is even allowed to include let-declarations?

I think the reason we have this is that in the node-types.json, we have both declaration_list and source_file list _declaration_statement as one of the possible child types, and the latter in turn has let_declaration as one of its kinds. So this is probably an oversimplification on the grammar side, which we need not follow?

In Rust (I decided to work on a language I actually know something about^^), `let_declaration`s are marked as children of commutative parents of both `declaration_list` (e.g. inside an impl block) and `source_file`, but I don't think either of those is even allowed to include let-declarations? I think the reason we have this is that in the `node-types.json`, we have both `declaration_list` and `source_file` list `_declaration_statement` as one of the possible child types, and the latter in turn has `let_declaration` as one of its kinds. So this is probably an oversimplification on the grammar side, which we need not follow?

wetneb commented

2025-07-14 22:32:10 +02:00

That's a good question. I think it's indeed a simplification. One could try making the grammar tighter and seeing if it still passes the test suite (but it's probably not worth adding more complexity to the parser for the sake of introducing this restriction).

Removing let_declarations from the children groups would probably make sense.

That's a good question. I think it's indeed a simplification. One could try making the grammar tighter and seeing if it still passes the test suite (but it's probably not worth adding more complexity to the parser for the sake of introducing this restriction). Removing `let_declaration`s from the children groups would probably make sense.

👍 1

ada4a added 4 commits

2025-07-15 19:44:30 +02:00

rm mentions of Rust/let_declaration 58ab5713e4

not actually a commutative child

add Rust/extern_crate_declaration c2b7c6998c

WIP: add Rust/where_predicate 76b40e7ae7

WIP: add Rust/impl_item

/ test (pull_request) Failing after 33s

Details

26ca1386ac

need to decide what the signature should be

ada4a reviewed

2025-07-15 19:46:25 +02:00

src/supported_langs.rs Outdated

					
				@ -262,9 +274,9 @@ pub static SUPPORTED_LANGUAGES: LazyLock<Vec<LangProfile>> = LazyLock::new(|| {

				                        "type_item",

				                        "function_item",

				                        "function_signature_item",

				                        "impl_item",

ada4a commented

2025-07-15 19:46:25 +02:00

looks like this:

impl Foo
where
    Foo: Copy,
{
    fn foo() -> f64 {}
    const TRUE: bool = true;
}

parses like this:

└source_file Commutative
  └impl_item
    ├impl
    ├type: identifier Foo
    ├where_clause Commutative
    │ ├where
    │ ├where_predicate Signature [[Foo], [: Copy]] // this is WIP, ignore for now
    │ │ └ ...
    │ └,
    └body: declaration_list Commutative
      ├{
      ├function_item Signature [[foo]]
      │ └ ...
      ├const_item Signature [[TRUE]]
      │ └ ...
      └}

Firstly, this can very well be marked as a commutative child, since impl blocks are reorderable.

Secondly, if we do add them as a commutative child, what would their signature be?

Theoretically, nothing stops one from having two impl blocks for the same type in the same file -- though normally one would either:

create two blocks only if they have different where-bounds (e.g. HashMap::new has no bounds, but HashMap::insert requires the key to be hashable, so you put them in different blocks)
create two blocks with the same (missing) bounds, and get a Clippy warning (though not by default apparently)

So in order to detect identical impl blocks, we would need to consider all of:

item that the impl is for
where bounds
all the items defined inside? this one sounds especially daunting
apparently, impl Trait for Foo is also an impl_item?.. In that case, we'd also need to look at the type field, where Trait is stored in this case. Although in that case, having two impl X for Y for the same (X,Y) is disallowed

looks like this: ```rs impl Foo where Foo: Copy, { fn foo() -> f64 {} const TRUE: bool = true; } ``` parses like this: ``` └source_file Commutative └impl_item ├impl ├type: identifier Foo ├where_clause Commutative │ ├where │ ├where_predicate Signature [[Foo], [: Copy]] // this is WIP, ignore for now │ │ └ ... │ └, └body: declaration_list Commutative ├{ ├function_item Signature [[foo]] │ └ ... ├const_item Signature [[TRUE]] │ └ ... └} ``` Firstly, this can very well be marked as a commutative child, since impl blocks are reorderable. Secondly, if we do add them as a commutative child, what would their signature be? Theoretically, nothing stops one from having two impl blocks for the same type in the same file -- though normally one would either: - create two blocks only if they have different where-bounds (e.g. `HashMap::new` has no bounds, but `HashMap::insert` requires the key to be hashable, so you put them in different blocks) - create two blocks with the same (missing) bounds, and get a [Clippy warning](https://rust-lang.github.io/rust-clippy/master/index.html#multiple_inherent_impl) (though not by default apparently) So in order to detect _identical_ impl blocks, we would need to consider all of: - item that the impl is for - where bounds - all the items defined inside? this one sounds especially daunting - apparently, `impl Trait for Foo` is also an `impl_item`?.. In that case, we'd also need to look at the `type` field, where `Trait` is stored in this case. Although in that case, having two `impl X for Y` for the same (X,Y) _is_ disallowed

wetneb commented

2025-07-23 16:09:11 +02:00

That's a tricky case indeed. I guess we could change the grammar so that it parses impl Trait for Foo as a different node type. Then, for that node type, we could define a signature which bundles up:

the trait
the type
the where_clause if any

And those all together should be unique in a given file, no?

For the general impl Foo { … } case, we could perhaps also just use vec![vec![]] as signature… (and maybe it's also a sensible temporary solution for impl Trait for Foo in the meantime)

That's a tricky case indeed. I guess we could change the grammar so that it parses `impl Trait for Foo` as a different node type. Then, for that node type, we could define a signature which bundles up: * the trait * the type * the where_clause if any And those all together should be unique in a given file, no? For the general `impl Foo { … }` case, we could perhaps also just use `vec![vec![]]` as signature… (and maybe it's also a sensible temporary solution for `impl Trait for Foo` in the meantime)

ada4a commented

2025-07-23 16:17:55 +02:00

I guess we could change the grammar so that it parses impl Trait for Foo as a different node type

I mean I understand why the grammar might not want to concern itself with that -- highlighting and stuff don't really change just because of the Trait for part. But if it would be possible to reuse "impl_body" in both impl_item and "impl_trait_item", then it could probably work I think. Though interestingly enough, not even the compiler distinguishes between the two, neither in the AST nor in the HIR.

All of this makes me think that maybe we should just restrict ourselves to vec![vec![]] after all..

> I guess we could change the grammar so that it parses impl Trait for Foo as a different node type I mean I understand why _the grammar_ might not want to concern itself with that -- highlighting and stuff don't really change just because of the `Trait for` part. But if it would be possible to reuse "`impl_body`" in both `impl_item` and "`impl_trait_item`", then it could probably work I think. Though interestingly enough, not even the compiler distinguishes between the two, neither [in the AST](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/struct.Impl.html) nor [in the HIR](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Impl.html). All of this makes me think that maybe we should just restrict ourselves to `vec![vec![]]` after all..

👍 1

ada4a marked this conversation as resolved

ada4a force-pushed comm-parents-check-have-all-sigs from 26ca1386ac to fdf564ce76

2025-07-15 20:01:37 +02:00

Compare

ada4a reviewed

2025-07-15 20:04:33 +02:00

src/supported_langs.rs Outdated

					
				@ -332,2 +344,4 @@

				                // source_file

				                signature("use_declaration", vec![vec![Field("argument")]]),

				                signature("extern_crate_declaration", vec![vec![Field("name")]]),

				                signature("foreign_mod_item", todo!()),

ada4a commented

2025-07-15 20:04:33 +02:00

this is similar to impl_item, in that one can have multiple extern "C" { ... } items in the source code, and the only real way to find out if they are identical is to go through all the functions defined inside the block (and the foreign ABI convention ("C" here) as well, of course)

this is similar to `impl_item`, in that one can have multiple `extern "C" { ... }` items in the source code, and the only real way to find out if they are identical is to go through all the functions defined inside the block (and the [foreign ABI convention](https://doc.rust-lang.org/nomicon/ffi.html#foreign-calling-conventions) (`"C"` here) as well, of course)

wetneb commented

2025-07-23 15:56:41 +02:00

Then maybe use vec![vec![]] as signature? It means the whole node is treated as a signature, which would at least prevent isomorphic elements from being added.

Then maybe use `vec![vec![]]` as signature? It means the whole node is treated as a signature, which would at least prevent isomorphic elements from being added.

ada4a commented

2025-07-23 15:59:07 +02:00

makes sense!

ada4a marked this conversation as resolved

ada4a commented

2025-07-15 20:07:48 +02:00

The reason I push these not-quite-ready commits is so that we could have one discussion thread for each signature (since I feel like there are many--different--non-trivial cases) -- of course, I would've preferred to be able to open threads here directly...

ada4a reviewed

2025-07-15 20:39:19 +02:00

src/supported_langs.rs Outdated

					
				@ -348,0 +362,4 @@

				                // where_clause

				                signature(

				                    "where_predicate",

				                    vec![vec![Field("left")], vec![Field("bounds")]],

ada4a commented

2025-07-15 20:39:19 +02:00

here's an example of its use:

the where clause in this:

fn foo<T>()
where
    T: Sized,
    T: Send,
{
}

gets parsed as this:

└where_clause Commutative
  ├where
  ├where_predicate
  │ ├left: identifier T
  │ └bounds: trait_bounds Commutative
  │   ├:
  │   └identifier Sized Signature [[Sized]]
  ├,
  ├where_predicate
  │ ├left: identifier T
  │ └bounds: trait_bounds Commutative
  │   ├:
  │   └identifier Send Signature [[Send]]
  └,

and we're asked to provide signatures for the where_predicates.

I initially went with the approach of left + (list of all identifiers in bound). But then I saw that use_declaration uses the simpler approach, where it creates the signature from just argument, which is everything that follows use. And so I thought where_predicate could do the same for the right half of its signature.

Although I'm not quite sure that the argument thing is safe to do in the first place? I think the following two use declarations:

use foo::{Bar, Baz};
use foo::{
    Bar,
    Baz,
};

would technically have different arguments (and thus different signatures) because of the different formatting? At least Ast::ascii_tree does show the signatures as with the formatting respected

here's an example of its use: the where clause in this: ```rs fn foo<T>() where T: Sized, T: Send, { } ``` gets parsed as this: ``` └where_clause Commutative ├where ├where_predicate │ ├left: identifier T │ └bounds: trait_bounds Commutative │ ├: │ └identifier Sized Signature [[Sized]] ├, ├where_predicate │ ├left: identifier T │ └bounds: trait_bounds Commutative │ ├: │ └identifier Send Signature [[Send]] └, ``` and we're asked to provide signatures for the `where_predicate`s. I initially went with the approach of `left` + (list of all `identifier`s in `bound`). But then I saw that `use_declaration` uses the simpler approach, where it creates the signature from just `argument`, which is everything that follows `use`. And so I thought `where_predicate` could do the same for the right half of its signature. Although I'm not _quite_ sure that the `argument` thing is safe to do in the first place? I think the following two use declarations: ```rs use foo::{Bar, Baz}; use foo::{ Bar, Baz, }; ``` would _technically_ have different `argument`s (and thus different signatures) because of the different formatting? At least `Ast::ascii_tree` does show the signatures as with the formatting respected

wetneb commented

2025-07-23 15:59:10 +02:00

What you did looks good to me. It's true that it will fail to identify bounds that are slightly different, but it's the best we can do given the system we have.

Concerning your example about use statements above, the deciding factor that would make the signatures different is the trailing comma (the formatting differences shouldn't make a difference since nodes in signatures are treated up to isomorphism).

What you did looks good to me. It's true that it will fail to identify bounds that are slightly different, but it's the best we can do given the system we have. Concerning your example about `use` statements above, the deciding factor that would make the signatures different is the trailing comma (the formatting differences shouldn't make a difference since nodes in signatures are treated up to isomorphism).

👍 1

ada4a marked this conversation as resolved

ada4a force-pushed comm-parents-check-have-all-sigs from fdf564ce76 to abc6d55858

2025-07-23 16:04:53 +02:00

Compare

ada4a force-pushed comm-parents-check-have-all-sigs from abc6d55858 to e7abb14911

2025-07-23 16:06:25 +02:00

Compare

ada4a force-pushed comm-parents-check-have-all-sigs from e7abb14911 to 0bb42380ca

2025-07-26 20:29:19 +02:00

Compare

ada4a force-pushed comm-parents-check-have-all-sigs from 0bb42380ca to a5098e3472

2025-07-26 20:32:42 +02:00

Compare

ada4a added 1 commit

2025-07-26 21:14:09 +02:00

add the rest of Java signatures

/ test (pull_request) Failing after 36s

Details

45b8c682f2

ada4a added 2 commits

2025-07-26 21:42:16 +02:00

add Kotlin/property_declaration ea211820f1

WIP: add Kotlin/annotation

/ test (pull_request) Failing after 34s

Details

b63c22a5bd

ada4a reviewed

2025-07-26 22:03:05 +02:00

src/supported_langs.rs Outdated

					
				@ -239,1 +282,4 @@

				                signature("delegation_specifier", vec![vec![]]),

				                // modifiers

				                signature(

				                    "annotation",

ada4a commented

2025-07-26 22:03:05 +02:00

So the Kotlin grammar is a bit annoying -- it seems to only ever use ChildTypes, which in the particular case of annotation causes the following problem:

An annotation normally looks like this:

@Anno

but it can optionally include a use site target, for example:

@field:Anno

And the way the grammar defines this use_site_target is as follows: 3dea6dfa9c/grammar.js (L534-L537), so what we get from parsing is something like this:

annotation Signature [[Anno]]
 ├@
 ├use_site_target
 │ ├field
 │ └:
 └user_type
   └identifier Anno

so we'd need to add an alternative sigdef path for all possible variants of use_site_target (similar to #540)

Or, much more simply, change the sigdef to be vec![vec![]], since there aren't that many moving parts there anyway...

Well, not really, there's also this.. thing: https://kotlinlang.org/docs/annotations.html#arrays-as-annotation-parameters, so this:

@AnnWithArrayMethod(names = ["abc", "foo", "bar"])

gets parsed to this:

annotation Signature [[]]
 ├@
 └constructor_invocation
   ├user_type
   │ └identifier AnnWithArrayMethod
   └value_arguments
     ├(
     ├value_argument
     │ ├identifier names
     │ ├=
     │ └collection_literal
     │   ├[
     │   ├string_literal
     │   │ ├"
     │   │ ├string_content abc
     │   │ └"
     │   ├,
     │   ├string_literal
     │   │ ├"
     │   │ ├string_content foo
     │   │ └"
     │   ├,
     │   ├string_literal
     │   │ ├"
     │   │ ├string_content bar
     │   │ └"
     │   └]
     └)

and so doesn't even pass into the sigdef I currently have, but also wouldn't quite work with the naive vec![vec![]], because I think the parameters aren't important as far as the duplicate signature check goes?

Although it looks like none of this is important, because there's a meta-attribute that can allow an attribute to be used multiple times: https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.annotation/-repeatable/

This is a bit too much, I'll go touch some grass for a bit..

So the Kotlin grammar is a bit annoying -- it seems to only ever use `ChildType`s, which in the particular case of `annotation` causes the following problem: An annotation normally looks like this: ```kt @Anno ``` but it can optionally include a [use site target](https://kotlinlang.org/docs/annotations.html#annotation-use-site-targets), for example: ```kt @field:Anno ``` And the way the grammar defines this `use_site_target` is as follows: https://github.com/tree-sitter-grammars/tree-sitter-kotlin/blob/3dea6dfa9c0129deb7c4315afbda806c85c41667/grammar.js#L534-L537, so what we get from parsing is something like this: ``` annotation Signature [[Anno]] ├@ ├use_site_target │ ├field │ └: └user_type └identifier Anno ``` so we'd need to add an alternative sigdef path for all possible variants of `use_site_target` (similar to #540) Or, much more simply, change the sigdef to be `vec![vec![]]`, since there aren't that many moving parts there anyway... Well, not really, there's also this.. thing: https://kotlinlang.org/docs/annotations.html#arrays-as-annotation-parameters, so this: ```kt @AnnWithArrayMethod(names = ["abc", "foo", "bar"]) ``` gets parsed to this: ``` annotation Signature [[]] ├@ └constructor_invocation ├user_type │ └identifier AnnWithArrayMethod └value_arguments ├( ├value_argument │ ├identifier names │ ├= │ └collection_literal │ ├[ │ ├string_literal │ │ ├" │ │ ├string_content abc │ │ └" │ ├, │ ├string_literal │ │ ├" │ │ ├string_content foo │ │ └" │ ├, │ ├string_literal │ │ ├" │ │ ├string_content bar │ │ └" │ └] └) ``` and so doesn't even pass into the sigdef I currently have, but also wouldn't quite work with the naive `vec![vec![]]`, because I think the parameters aren't important as far as the duplicate signature check goes? Although it looks like none of this is important, because there's a meta-attribute that can allow an attribute to be used multiple times: https://kotlinlang.org/api/core/kotlin-stdlib/kotlin.annotation/-repeatable/ This is a bit too much, I'll go touch some grass for a bit..

ada4a commented

2025-08-01 23:35:46 +02:00

decided to go with plain vec![vec![]], just like we do for annotations in Java

decided to go with plain `vec![vec![]]`, just like we do for annotations in Java

ada4a force-pushed comm-parents-check-have-all-sigs from b63c22a5bd to 0957e3a0a2

2025-08-01 23:36:11 +02:00

Compare

ada4a added 1 commit

2025-08-02 00:07:43 +02:00

add Go/import_declaration

/ test (pull_request) Failing after 31s

Details

2bb1b592eb

ada4a commented

2025-08-06 09:24:33 +02:00

I would propose we merge what we've got so far (except the test of course), and then continue? Especially if you're planning to do a release soon-ish

wetneb commented

2025-08-06 09:29:49 +02:00

Makes sense!

ada4a referenced this pull request

2025-08-14 17:23:59 +02:00

feat: add some signatures for children of commutative parents #567

ada4a referenced this pull request from a commit

2025-08-14 19:32:04 +02:00

feat: add some signatures for children of commutative parents (#567)

ada4a force-pushed comm-parents-check-have-all-sigs from 2bb1b592eb to e0ff006144

2025-08-14 19:37:26 +02:00

Compare

ada4a commented

2025-08-14 19:52:32 +02:00

Ok the extraction thing was a bit awkward.. Let's do this another way:

in this PR, add #[ignore = "not all signatures are added yet"] to the test, and merge it
add further signatures in separate PRs -- they can temporarily remove the #[ignore] for testing, but shouldn't include that change in the final version

Alternatively, we could do my favorite thing: a UI test! For that, we could:

don't add the #[ignore] to the test
instead, save the current list of node types with missing signatures to a file
in the test, compare that list with the one we get from the test

This has the advantage that, when someone adds a new language, and doesn't add all the necessary signatures, they'll get a test failure that shows them which signatures are missing in particular.

In both cases, we might want to open a tracking issue for all the remaining signatures. There, for each language with missing signatures, we could tag the people who added that particular language, with the hope that it'd be easier for them to add those since they're more familiar with the particular language's semantics than.. me.

What do you think?

Ok the extraction thing was a bit awkward.. Let's do this another way: 1. in this PR, add `#[ignore = "not all signatures are added yet"]` to the test, and merge it 2. add further signatures in separate PRs -- they can temporarily remove the `#[ignore]` for testing, but shouldn't include that change in the final version Alternatively, we could do my favorite thing: a UI test! For that, we could: 1. don't add the `#[ignore]` to the test 2. instead, save the current list of node types with missing signatures to a file 3. in the test, compare that list with the one we get from the test This has the advantage that, when someone adds a new language, and doesn't add all the necessary signatures, they'll get a test failure that shows them which signatures are missing in particular. In both cases, we might want to open a tracking issue for all the remaining signatures. There, for each language with missing signatures, we could tag the people who added that particular language, with the hope that it'd be easier for them to add those since they're more familiar with the particular language's semantics than.. me. What do you think?

wetneb commented

2025-10-05 12:00:20 +02:00

Sorry, I completely missed that last message…
Both solutions look good to me.
On the long term I would like to list all possible children of a commutative parent (not just the ones that are mentioned in the language profile) to do this check. But it's much more involved so we clearly shouldn't let that hold this PR back.

Sorry, I completely missed that last message… Both solutions look good to me. On the long term I would like to list all possible children of a commutative parent (not just the ones that are mentioned in the language profile) to do this check. But it's much more involved so we clearly shouldn't let that hold this PR back.

/ test (pull_request) Failing after 39s

Details

This pull request has changes conflicting with the target branch.

src/supported_langs.rs

View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u origin comm-parents-check-have-all-sigs:comm-parents-check-have-all-sigs

git switch comm-parents-check-have-all-sigs

Sign in to join this conversation.

No reviewers

No milestone

No project

No assignees

2 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#507

Rows
Columns