feat: Haskell support #429
No reviewers
Labels
No labels
Compat/Breaking
Kind
Bad merge
Kind
Bug
Kind
Documentation
Kind
Enhancement
Kind
Feature
Kind
New language
Kind
Security
Kind
Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: mergiraf/mergiraf#429
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "maralorn/mergiraf:feat-haskell-support"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hope this is fine.
I am amazed by how little effort this was and how much time this will save me and my coworkers.
Welcome, and thanks for the PR! Getting a lot of functional languages lately:)
I left some ideas for commutative parents, let me know what you think
@ -844,0 +847,4 @@
extensions: vec!["hs"],
language: tree_sitter_haskell::LANGUAGE.into(),
atomic_nodes: vec![],
commutative_parents: vec![
here's a couple more candidates for commutativity that come to mind:
function definitionsthe grammar apparently doesn't treat them as a single item, instead creating a new item for both the signature, and each pattern of a function, so something like becomes 3 separate items, whereas one would wish for them be treated as one. I'm guessing this has to do with the fact that Haskell does allow specifying just the signature of a function, without any patterns, though I don't know why that is (and even the linter warns against it?). Anyway, a bit unfortunateNum a
andShow a
can be freely reorderedShow
andEq
inderiving (Show, Eq)
. This one seems to have the grammar name oftuple
, just like the previous thing, so I wonder whether we could maketuple
s commutative parents in general? (EDIT: of course not -- for example those that define the return type of a function). Because I'm not sure there is a way in Mergiraf to say "this grammar element is only commutative in this context"Just a
andNothing
inMaybe a = Just a | Nothing
haskell
a commutative parent restricted topragma
children, see Adding children groupsof course, underspecifying commutative parents is never wrong, so you might be inclined to merge this PR as is and extend the
LangProfile
as the need arisesGood point! Allow me to just jump in here and heckle as a veteran Haskell user ;)
data Bool = True | False deriving (Eq, Ord)
we haveFalse > True
, but if we were to swap the order, the derivedOrd
instance would turn around.Another language feature that should be commutative: Record updates.
Yes of course, what I meant was having the entire function definitions be commutative among themselves, so have this:
be able to be reordered to this:
Oh, right. We do have a similar thing in Rust as well, but we choose to be lenient there, since otherwise we'd produce a lot of false-negatives.
What we've been wanting for a while now is to have a way to let the user customize the default config, in particular to make the commutativity rules more or less strict.
Ah apologies, yes, that makes sense.
Extremely nitpicky observation but this is in fact not universally true. The order in which constraints appears affects the order in which type variables are implicitly quantified, which is visible when using type applications.
won't compile if you swap the constraints
You could probably get away with ignoring this in mergiraf though, it maybe doesn't need to be 100%.
@michaelpj oh, thanks for sharing this! I think it may warrant removing the commutativity actually – let me explain, using Rust as a comparison point:
So from what you say, it looks like Haskell uses typeclass constraints as a way to get the order of not only the constraints themselves, but also the type variables.
In Rust, these are specified separately, and so the function from your example would like something like this:
with
T: Debug
being a trait bound (=typeclass constraint).And if there were more bounds on
T
, that would've looked likeT: Debug + Display
. Notice that the +-seprated list of bounds is fully commutative – reordering it doesn't make any observable difference. That's why we have them as commutative in the language profile for Rust.But the individual type parameters (=type variables) are not commutative, for a reason similar to Haskell – if one wanted to specify
T
, one would do this:which wouldn't work if you reordered
T
andU
in the function signature.In fact I did at one point open a PR that added commutativity to type parameters (#382), and @wetneb argued it's a bad idea with an example similar to the one above.
So Rust does split these two things, which allows us to make one of them commutative and not the other one. But Haskell doesn't, which leads me to the point that we should in fact disable commutativity for the whole thing (type variables), since one part of them corresponds to something we have disabled commutativity for in Rust.
@michaelpj @wetneb what do you think?
Arguably, if the compact conflict representation is used, understanding the conflict and resolving it manually shouldn't be too difficult. But that's assuming that the user does remember about type applications while resolving, and doesn't think to themselves "this is a trivial conflict, why didn't Mergiraf resolve it for me". One way to deal with that would be to give an explanation (as a log message) when creating such conflicts, but I have no idea how one would go about implementing that...
The good news is that I couldn't implement that specific commutativity because the Haskell treesitter grammar is not suited to it. So no need to remove anything. 🥳
hooray.. I guess 😅
All excellent ideas. Especially the type class constraints actually frequently produce merge errors in our codebase.
I will try to get to it quickly but imo this PR is self contained and doing the next step in follow up PRs seems like good practice?
I am curious of the function order thing. To correctly deal with this we need to be able to say something like "all functions, bindings and signatures commute" (which we can) "except if two functions have the same base name, then they don’t commute". Can we do the later in mergiraf?
Sure, yes! We're unlikely to make a new release soon, so you should have enough to complete all the follow-up PRs
I'm afraid we can't (currently), no. But I still think bundling several nodes together for the purpose of commutative merging could be something we could generalize beyond Haskell. In fact, this kind of reminds of the problem of associating comments/attributes with the nodes they annotate. @wetneb maybe you have some ideas?
@ada4a wrote in #429 (comment):
Hmm, I'm not sure how this could fall into the same sort of bundling heuristic as the one for comments, it seems quite different to me. But it's useful food for thought!
Let's merge this already, more commutativity can be added in follow-up PRs :)
@wetneb wrote in #429 (comment):
What I was thinking is that, just as with atomic nodes we say "treat this node and all its children as a single leaf node", and with the comments we (want to) say "treat this comment node and the node following it as a single leaf node", we could say in Haskell "treat this signature node, and the following pattern nodes, as a single leaf node" – so in general we could have a AST postprocessing stage where we bundle sibling nodes this way.
Of course it would be much nicer if the grammar did this for us, similar to how all the imports are apparently grouped under
imports
in the Haskell grammar (probably partially because the language requires them all to be at the beginning of the file), but in the absence of that...Ok I see, but I think I wouldn't want the comment bundling heuristic to be something that treats the comment and the node it attaches to as a leaf, because that would forbid any conflict resolution in it (such as, in an entire method).
Sadly I have struggle adding the other features:
Lets look at deriving:
the CommutativeParent here would be "tuple".
These is the typeclass constraints context:
same here.
And in general, tuple elements very much do not commutate.
Any hints as to how I could pin this down? e.g. I would need to match on the field name of the commutative parent?
Yeah, that's not supported in Mergiraf at the moment.
One way of doing it would be to modify the grammar so that not the general
tuple
node in this location, but a more specific one. Arguably, this probably more correct, because surely you can't use any sort of tuples there? Something like(1.4, "a")
wouldn't make sense, right? In my limited experience, it's difficult to make the case for those sorts of changes in tree-sitter parsers, because they are designed for highlighting, and so people don't care too much about the grammar accepting invalid inputs. But you can still try and propose it, perhaps. If it doesn't get accepted, we can still consider forking it (just like I did recently with the rust grammar, https://codeberg.org/grammar-orchard/tree-sitter-rust-orchard).Another approach would be to change mergiraf so that it accepts a more complicated selector for commutative parents, but that's quite a lot of effort and I'm not sure it's worth it.
@wetneb wrote in #429 (comment):
Fair enough. We probably want not atomicity, but rather to collect these nodes under a virtual parent node, like this:
PS: I so wish we had comment threads on Codeberg 😅
@wetneb wrote in #429 (comment):
Funnily enough, the first thing @maralorn added,
import_list
, is an example of this -- it's basically a tuple, but still gets its separate grammar type.Speaking of which -- @maralorn, I think you'll want to define signatures for
import_name
s, similarly to record fields (as mentioned at #431 (comment)). Because currently, this:would get resolved as
since Mergiraf doesn't equate the two
baz
sSadly it is more complicated than that.
We could have
which should resolve to
I am not sure if I can manage to find a signature for that … I will have a look at which options we have for signatures. At least the simple case of deduplicating names should be possible.
At least double imports are not an error but just a warning in Haskell and can get cleaned by a formatter.
The easiest is just to use the whole child as signature, which will ensure you don't end up with two identical imports at the same level.
So far, Mergiraf will only generate something like
import Foo(Bar(bar),Foo(baz),foo)
, but perhaps your linter can normalize that intoimport Foo(Bar(bar,baz),foo)
.I added all the mentioned signatures to the PR in #431.
__all__
declarations #443