feat: Support for pyproject.toml #582

Merged
wetneb merged 4 commits from wetneb/mergiraf:pyproject into main 2025-09-14 20:16:16 +02:00
Owner

This is an attempt at enabling the commutativity of specific fields in a pyproject.toml file, following their specs.
My impression is that such files are widespread enough to justify having their own language profile, especially because it doesn't cost us much in terms of binary size (as we're reusing a parser that we already include).
Note that I've added the language profile before the TOML one, so that the more specific pyproject.toml one is encountered first. Ideally we should not have to worry about this order of definitions and the most specific profile would get selected automatically.

This is prompted by a request from Stainless but I'm doing this pro bono (it doesn't actually solve their motivating conflict which appears in a tool section of the pyproject.toml).

This is an attempt at enabling the commutativity of specific fields in a `pyproject.toml` file, following [their specs](https://packaging.python.org/en/latest/specifications/pyproject-toml/). My impression is that such files are widespread enough to justify having their own language profile, especially because it doesn't cost us much in terms of binary size (as we're reusing a parser that we already include). Note that I've added the language profile before the TOML one, so that the more specific `pyproject.toml` one is encountered first. Ideally we should not have to worry about this order of definitions and the most specific profile would get selected automatically. This is prompted by a request from Stainless but I'm doing this pro bono (it doesn't actually solve their motivating conflict which appears in a `tool` section of the `pyproject.toml`).
feat: Support for pyproject.toml
All checks were successful
/ test (pull_request) Successful in 44s
702dd3241a
wetneb changed title from feat: Support for pyproject.toml to feat: Support for pyproject.toml 2025-09-03 08:56:54 +02:00
Owner

Ideally we should not have to worry about this order of definitions and the most specific profile would get selected automatically.

For that, we would probably first try to match by the whole filename, and otherwise fall back to a search by extension. Sounds simple enough, I'll try doing that after this PR

> Ideally we should not have to worry about this order of definitions and the most specific profile would get selected automatically. For that, we would probably first try to match by the whole filename, and otherwise fall back to a search by extension. Sounds simple enough, I'll try doing that after this PR
@ -588,6 +588,44 @@ pub static SUPPORTED_LANGUAGES: LazyLock<Vec<LangProfile>> = LazyLock::new(|| {
injections: None,
flattened_nodes: &[],
},
LangProfile {
Owner

Could you please add (some version of) the following as a comment here?

Note that I've added the language profile before the TOML one, so that the more specific pyproject.toml one is encountered first.

We can remove it once we implement filename-first search, but just to be sure for now

Could you please add (some version of) the following as a comment here? > Note that I've added the language profile before the TOML one, so that the more specific `pyproject.toml` one is encountered first. We can remove it once we implement filename-first search, but just to be sure for now
ada4a marked this conversation as resolved
@ -591,0 +610,4 @@
r#"(table
(bare_key) @table_key (#any-of? @table_key "build-system" "project")
(pair
(bare_key) @pair_key (#any-of? @pair_key "requires" "license-files" "authors" "maintainers" "keywords" "classifiers" "dependencies" "dynamic")
Owner

I think the arrays in the values of the project.optional-dependencies are commutative as well?

I tried to build a query for that (my very first one, so don't judge too harshly^^):

(table
  (bare_key) @table_key (#eq? @table_key "project") 
  (pair
    (bare_key) @pair_key (#eq? @pair_key "optional-dependencies")
    (inline_table
      (pair
        _
        (array) @array
      )*
    )
  )
)

but when I tried running it on the following snippet:

[project]
optional-dependencies = {
  foo = ["one", "two", "three"]
  bar = ["four", "five", "six"]
}

only the second table (["four", "five", "six"]) got highlighted, whereas I thought that the * in (pair)* would make its inner pattern match each key-value pair, and thus extract all the tables.

I think the arrays in the values of the `project.optional-dependencies` [are commutative as well](https://packaging.python.org/en/latest/specifications/pyproject-toml/#dependencies-optional-dependencies)? I tried to build a query for that (my very first one, so don't judge too harshly^^): ```lisp (table (bare_key) @table_key (#eq? @table_key "project") (pair (bare_key) @pair_key (#eq? @pair_key "optional-dependencies") (inline_table (pair _ (array) @array )* ) ) ) ``` but when I tried running it on the following snippet: ```toml [project] optional-dependencies = { foo = ["one", "two", "three"] bar = ["four", "five", "six"] } ``` only the second table (`["four", "five", "six"]`) got highlighted, whereas I thought that the `*` in `(pair)*` would make its inner pattern match _each_ key-value pair, and thus extract all the tables.
Owner

One more complication with this is that the snippet above could, AFAICT, be written as follows as well (and that would probably even be the more common approach):

[project.optional-dependencies]
foo = ["one", "two", "three"]
bar = ["four", "five", "six"]

which the query above of course wouldn't match. So I did write a separate query:

(table
  (dotted_key
    (bare_key) @project (#eq? @project "project")
    (bare_key) @deps (#eq? @deps "optional-dependencies")
  )
  (pair
    (bare_key)
    (array) @array
  )*
)

and that one actually does match both arrays, but does that mean that we will need two queries just for this pattern alone? That sounds a bit unfortunate..

One more complication with this is that the snippet above could, AFAICT, be written as follows as well (and that would probably even be the more common approach): ```toml [project.optional-dependencies] foo = ["one", "two", "three"] bar = ["four", "five", "six"] ``` which the query above of course wouldn't match. So I did write a separate query: ```lisp (table (dotted_key (bare_key) @project (#eq? @project "project") (bare_key) @deps (#eq? @deps "optional-dependencies") ) (pair (bare_key) (array) @array )* ) ``` and that one actually does match _both_ arrays, but does that mean that we will need two queries just for this pattern alone? That sounds a bit unfortunate..
Author
Owner

Somehow the TOML example you give above doesn't seem to parse with the tree-sitter grammar, nor with https://www.toml-lint.com/, and I'm not sure why (beyond the missing comma, perhaps, but that doesn't seem to fix it).

I'm adding the query for the second syntax, removing the * because it's not necessary: the query only needs to match one commutative child at a time (each commutative child will correspond to a different match of the overall query).

Somehow the TOML example you give above doesn't seem to parse with the tree-sitter grammar, nor with https://www.toml-lint.com/, and I'm not sure why (beyond the missing comma, perhaps, but that doesn't seem to fix it). I'm adding the query for the second syntax, removing the `*` because it's not necessary: the query only needs to match one commutative child at a time (each commutative child will correspond to a different match of the overall query).
Owner

Somehow the TOML example you give above doesn't seem to parse with the tree-sitter grammar, nor with https://www.toml-lint.com/, and I'm not sure why (beyond the missing comma, perhaps, but that doesn't seem to fix it).

Ah, right, that's because inline tables are apparently required to be on one line, so the following does parse:

[project]
optional-dependencies = { foo = ["one", "two", "three"], bar = ["four", "five", "six"] }

removing the * because it's not necessary

I see! In fact removing it from the first query makes it highlight both commutative children as well!

> Somehow the TOML example you give above doesn't seem to parse with the tree-sitter grammar, nor with https://www.toml-lint.com/, and I'm not sure why (beyond the missing comma, perhaps, but that doesn't seem to fix it). Ah, right, that's because inline tables are apparently required to be on one line, so the following does parse: ```toml [project] optional-dependencies = { foo = ["one", "two", "three"], bar = ["four", "five", "six"] } ``` > removing the * because it's not necessary I see! In fact removing it from the first query makes it highlight both commutative children as well!
Add comment
All checks were successful
/ test (pull_request) Successful in 40s
f50de49975
Add query for optional-dependencies
All checks were successful
/ test (pull_request) Successful in 39s
0de2d7474f
Add query for optional deps as inline tables
All checks were successful
/ test (pull_request) Successful in 40s
124b38b91b
ada4a approved these changes 2025-09-14 20:00:37 +02:00
ada4a left a comment
Owner

Nice!

Nice!
wetneb merged commit 4f7e110d1c into main 2025-09-14 20:16:16 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#582
No description provided.