Minimal CMake support #572

Merged
wetneb merged 4 commits from OvidiusCicero/mergiraf:feat/cmake_support into main 2025-08-23 17:56:04 +02:00
Contributor

Minimal CMake support without commutative stuff and only one test. Is this enough for you guys?

On a related note, are there tipps how to run mergiraf for this or Makefiles with Jujutsu? Jujutsu runs it through mergiraf merge <base> <left> <right>, changing the file name which obstructs language recognition which relies on CmakeFileLists.txt—a similar problem exists with Makefile

Minimal CMake support without commutative stuff and only one test. Is this enough for you guys? On a related note, are there tipps how to run mergiraf for this or Makefiles with Jujutsu? Jujutsu runs it through `mergiraf merge <base> <left> <right>`, changing the file name which obstructs language recognition which relies on `CmakeFileLists.txt`—a similar problem exists with `Makefile`
Owner

Thanks for the PR! I got side-tracked by looking into the jujutsu problem (thanks for reporting it too), so first I'm off to making a PR over there to address the problem you brought up. :)

Thanks for the PR! I got side-tracked by looking into the jujutsu problem (thanks for reporting it too), so first I'm off to [making a PR over there](https://github.com/jj-vcs/jj/pull/7292) to address the problem you brought up. :)
OvidiusCicero force-pushed feat/cmake_support from 39b3548d02 to 6e9d496fb1 2025-08-19 17:55:51 +02:00 Compare
wetneb approved these changes 2025-08-21 18:26:48 +02:00
wetneb left a comment
Owner

That works for me. I don't know CMake well enough to be able to suggest commutative parents to add.
Happy to merge soon unless others have suggested changes.

That works for me. I don't know CMake well enough to be able to suggest commutative parents to add. Happy to merge soon unless others have suggested changes.
Owner

I don't know CMake well enough to be able to suggest commutative parents to add.

Yeah I'm not familiar with it either. @mathstuf maybe you?

> I don't know CMake well enough to be able to suggest commutative parents to add. Yeah I'm not familiar with it either. @mathstuf maybe you?
Author
Contributor

So the thing is that the CMake language is macro based and even though there are arguments to functions which are indeed commutative it's on semantical but not really syntactical level

For example, for adding a library one would do

add_library(mylib SHARED
    sources/animation.cpp
    sources/buffers.cpp
    [...]
)

and the parsed tree looks like the following

  └normal_command
    ├identifier add_library
    ├(
    ├argument_list
    │ ├argument
    │ │ └unquoted_argument mylib
    │ ├argument
    │ │ └unquoted_argument SHARED
    │ ├argument
    │ │ └unquoted_argument sources/animation.cpp
    │ └argument
    │   └unquoted_argument sources/buffers.cpp
    └)

where the sources should be commutative. However the name of the library must be at the beginning but is in the parsing result just an argument and the SHARED argument is an option which must come afterwards but is also just an argument. I am pretty sure one can set options to sources is possible as well—if not for add_library then for other similar stuff. Writing these rules would at least be very involved.

So the thing is that the CMake language is macro based and even though there are arguments to functions which are indeed commutative it's on semantical but not really syntactical level For example, for adding a library one would do ~~~ add_library(mylib SHARED sources/animation.cpp sources/buffers.cpp [...] ) ~~~ and the parsed tree looks like the following ~~~ └normal_command ├identifier add_library ├( ├argument_list │ ├argument │ │ └unquoted_argument mylib │ ├argument │ │ └unquoted_argument SHARED │ ├argument │ │ └unquoted_argument sources/animation.cpp │ └argument │ └unquoted_argument sources/buffers.cpp └) ~~~ where the sources should be commutative. However the name of the library must be at the beginning but is in the parsing result just an argument and the SHARED argument is an option which must come afterwards but is also just an argument. I am pretty sure one can set options to sources is possible as well—if not for `add_library` then for other similar stuff. Writing these rules would at least be very involved.
Member

Yeah, I'm pretty familiar with CMake :) . I suspect this is one place where #565 would be useful (esp. given that the historical wart named CMakeLists.txt kind of falls in a hole of detect-language-by-extension algorithms :/ ).

I don't have time right now, but I can take a look. Please add tests in the meantime…they'll help a lot as I'm not intimately familiar with mergiraf internals. Ideas for tests:

  • merging argument lists: if both sides added to a list with no ALL_UPPERCASE_ARGS between them, assume it is "both get added" resolution
  • renaming an ALL_UPPER arg next to the addition of an argument after it should resolve to the rename and the addition
  • move and rename an argument (even across an ALL_UPPER arg span

I can't think of other common conflict resolutions that wouldn't need manual intervention anyways, but these seem helpful to start with. It also really depends on what the tree-sitter for CMake actually provides.

Yeah, I'm [pretty familiar](https://github.com/Kitware/CMake/graphs/contributors) with CMake :) . I suspect this is one place where #565 would be useful (esp. given that the historical wart named `CMakeLists.txt` kind of falls in a hole of detect-language-by-extension algorithms :/ ). I don't have time right now, but I can take a look. Please add tests in the meantime…they'll help a lot as I'm not intimately familiar with `mergiraf` internals. Ideas for tests: - merging argument lists: if both sides added to a list with no `ALL_UPPERCASE_ARGS` between them, assume it is "both get added" resolution - renaming an `ALL_UPPER` arg next to the addition of an argument after it should resolve to the rename and the addition - move and rename an argument (even across an `ALL_UPPER` arg span I can't think of other common conflict resolutions that wouldn't need manual intervention anyways, but these seem helpful to start with. It also really depends on what the tree-sitter for CMake actually provides.
Owner

@OvidiusCicero is this kind of special parameter handling something unique to add_library, or does that happen in other functions as well? I guess in the latter case there is indeed not much more that we, or the grammar, could do, unfortunately

@mathstuf the "maybe" was a bit tongue-in-cheek, yes^^

@OvidiusCicero is this kind of special parameter handling something unique to `add_library`, or does that happen in other functions as well? I guess in the latter case there is indeed not much more that we, or the grammar, could do, unfortunately @mathstuf the "maybe" was a bit tongue-in-cheek, yes^^
Author
Contributor

I'll add a test for moving and renaming an argument but your other recommendations don't make sense since the underlying grammar does not differentiate between normal and UPPERCASE arguments I think

I'll add a test for moving and renaming an argument but your other recommendations don't make sense since the underlying grammar does not differentiate between normal and UPPERCASE arguments I think
Author
Contributor

@ada4a I'm not ultra familar with the language itself but it definitely happens at other functions as well

@ada4a I'm not ultra familar with the language itself but it definitely happens at other functions as well
Member

Just for context, the CMake language is actually two stacked parsers. The first is the "command" parser which grabs the command_name(bag of arguments) items. Some commands (notably the conditional-accepting ones: if, elseif, while) receive the list of arguments as parsed by this parser with quoting information (this is how CMake avoids dereferencing any bar variable in if (foo STREQUAL "bar") post-CMP0054). The rest receive a list of strings that are the arguments. Then there is the ${var_expn} parser which also handles \n escapes and such (this actually happens before the command gets its argument list).

There's also the macro…craziness where its arguments are not variables in the macro scope; the macro body has ${argname} replaced before execution, so if (macro_arg) doesn't work.

Given this, there are a few things that I think the tree-sitter-cmake project could improve:

  • treat ALL_CAP unquoted arguments as "special" and as keyword argument separators
  • properly parse predicate syntax (if and friends)
  • somehow point to unexpanded reference to macro arguments (probably needs proper LSP logic to actually use though)
Just for context, the CMake language is actually two stacked parsers. The first is the "command" parser which grabs the `command_name(bag of arguments)` items. *Some* commands (notably the conditional-accepting ones: `if`, `elseif`, `while`) receive the list of arguments as parsed by this parser with quoting information (this is how CMake avoids dereferencing any `bar` variable in `if (foo STREQUAL "bar")` post-CMP0054). The rest receive a list of strings that are the arguments. Then there is the `${var_expn}` parser which also handles `\n` escapes and such (this actually happens before the command gets its argument list). There's also the `macro`…craziness where its arguments are not variables in the macro scope; the macro body has `${argname}` replaced before execution, so `if (macro_arg)` doesn't work. Given this, there are a few things that I think the tree-sitter-cmake project could improve: - treat `ALL_CAP` unquoted arguments as "special" and as keyword argument separators - properly parse predicate syntax (`if` and friends) - somehow point to unexpanded reference to macro arguments (probably needs proper LSP logic to actually use though)
Owner

The parser seems to be actively maintained and those suggested improvement look like they would likely be useful to other users (for instance, to highlight ALL_CAPS arguments in a different way), so I imagine some of those contributions could be made to the parser, if anyone is interested.

For instance, one could adjust the regexes for the variables and introduce a new node type to represent all caps arguments. I'm not clear on how to adapt the syntax of argument lists afterwards then, because I couldn't find any distinction between capitalized and non-capitalized arguments in the language documentation. Is it that any non-capitalized element can be followed by any repetition of capitalized elements, which should be understood as options applying to that element? Something like:

argument_list: $.repeat($.argument),

argument: $.seq(
     $.non_capitalized_argument,
     $.repeat($.capitalized_unquoted_argument)
)
The parser seems to be actively maintained and those suggested improvement look like they would likely be useful to other users (for instance, to highlight ALL_CAPS arguments in a different way), so I imagine some of those contributions could be made to the parser, if anyone is interested. For instance, one could adjust the regexes for the variables and introduce a new node type to represent all caps arguments. I'm not clear on how to adapt the syntax of argument lists afterwards then, because I couldn't find any distinction between capitalized and non-capitalized arguments in [the language documentation](https://cmake.org/cmake/help/latest/manual/cmake-language.7.html). Is it that any non-capitalized element can be followed by any repetition of capitalized elements, which should be understood as options applying to that element? Something like: ``` argument_list: $.repeat($.argument), argument: $.seq( $.non_capitalized_argument, $.repeat($.capitalized_unquoted_argument) ) ```
Member

It's not a language thing; CMake is very stringly-typed. It's just convention (which, alas, is all one really has with parsing arbitrary CMake command argument lists). There are holes in that you might have all-cap strings otherwise which get caught up unintentionally (e.g., variable names tend to be all-caps too). There's also no indication of whether a keyword accepts zero, one, or any number of arguments. But I think that the all-cap heuristic is at least useful for most cases. To get it Correct™, one needs a completion engine like shells because the arguments are really just lists of strings in general.

It's not a language thing; CMake is very stringly-typed. It's just convention (which, alas, is all one really has with parsing arbitrary CMake command argument lists). There are holes in that you might have all-cap strings otherwise which get caught up unintentionally (e.g., variable names tend to be all-caps too). There's also no indication of whether a keyword accepts zero, one, or any number of arguments. But I think that the all-cap heuristic is at least useful for most cases. To get it Correct™, one needs a completion engine like shells because the arguments are really just lists of strings in general.
Author
Contributor

so the example of moving and renaming an argument does not merge either; apparently the CMake language and the parser aren't very mergiraf friendly; merge this anyway?

so the example of moving and renaming an argument does not merge either; apparently the CMake language and the parser aren't very mergiraf friendly; merge this anyway?
Owner

The PR cannot currently be merged because Codeberg says that "This pull request is broken due to missing fork information."
I'll enquire about what that means and what we can do about it.

The PR cannot currently be merged because Codeberg says that "This pull request is broken due to missing fork information." I'll enquire about what that means and what we can do about it.
Owner

"This pull request is broken due to missing fork information."

I get that from time to time as well. Force-pushing with a dummy commit tends to fix the issue for me

> "This pull request is broken due to missing fork information." I get that from time to time as well. Force-pushing with a dummy commit tends to fix the issue for me
Owner

Would you be able to try force-pushing a dummy commit on this PR branch, @OvidiusCicero?

Would you be able to try force-pushing a dummy commit on this PR branch, @OvidiusCicero?
Author
Contributor

at least for me the bad words are gone now

at least for me the bad words are gone now
wetneb merged commit 59a55bf2b4 into main 2025-08-23 17:56:04 +02:00
wetneb referenced this pull request from a commit 2025-08-23 17:56:05 +02:00
OvidiusCicero deleted branch feat/cmake_support 2025-08-23 18:26:37 +02:00
Member

@mathstuf wrote in #572 (comment):

It's not a language thing; CMake is very stringly-typed. It's just convention (which, alas, is all one really has with parsing arbitrary CMake command argument lists). There are holes in that you might have all-cap strings otherwise which get caught up unintentionally (e.g., variable names tend to be all-caps too). There's also no indication of whether a keyword accepts zero, one, or any number of arguments. But I think that the all-cap heuristic is at least useful for most cases. To get it Correct™, one needs a completion engine like shells because the arguments are really just lists of strings in general.

Note that this is actually very similar to any shell script syntax highlighting arguments starting with - differently: it is purely a convention that such arguments are "flags" and should be highlighted any differently than non-- flags. If you'd like to open a discussion upstream, I'd be happy to help, but I don't really have the cycles to lead such an effort.

@mathstuf wrote in https://codeberg.org/mergiraf/mergiraf/pulls/572#issuecomment-6662875: > It's not a language thing; CMake is very stringly-typed. It's just convention (which, alas, is all one really has with parsing arbitrary CMake command argument lists). There are holes in that you might have all-cap strings otherwise which get caught up unintentionally (e.g., variable names tend to be all-caps too). There's also no indication of whether a keyword accepts zero, one, or any number of arguments. But I think that the all-cap heuristic is at least useful for most cases. To get it Correct™, one needs a completion engine like shells because the arguments are really just lists of strings in general. Note that this is actually very similar to any shell script syntax highlighting arguments starting with `-` differently: it is purely a convention that such arguments are "flags" and should be highlighted any differently than non-`-` flags. If you'd like to open a discussion upstream, I'd be happy to help, but I don't really have the cycles to *lead* such an effort.
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#572
No description provided.