feat: Haskell support #429

Merged
wetneb merged 1 commit from maralorn/mergiraf:feat-haskell-support into main 2025-06-05 12:51:25 +02:00
Contributor

Hope this is fine.

I am amazed by how little effort this was and how much time this will save me and my coworkers.

Hope this is fine. I am amazed by how little effort this was and how much time this will save me and my coworkers.
feat: Haskell support
All checks were successful
/ test (pull_request) Successful in 1m42s
da02862fd8
ada4a left a comment
Owner

Welcome, and thanks for the PR! Getting a lot of functional languages lately:)

I left some ideas for commutative parents, let me know what you think

Welcome, and thanks for the PR! Getting a lot of functional languages lately:) I left some ideas for commutative parents, let me know what you think
@ -844,0 +847,4 @@
extensions: vec!["hs"],
language: tree_sitter_haskell::LANGUAGE.into(),
atomic_nodes: vec![],
commutative_parents: vec![
Owner

here's a couple more candidates for commutativity that come to mind:

  1. function definitions the grammar apparently doesn't treat them as a single item, instead creating a new item for both the signature, and each pattern of a function, so something like
    len :: [a] -> Int
    len [] = 0
    len (x:xs) = ...
    
    becomes 3 separate items, whereas one would wish for them be treated as one. I'm guessing this has to do with the fact that Haskell does allow specifying just the signature of a function, without any patterns, though I don't know why that is (and even the linter warns against it?). Anyway, a bit unfortunate
  2. typeclass.. bounds? I don't know what the correct term for this is in Haskell -- I mean the things that are like Rust's trait bounds:
    foo :: (Num a, Show a) => a -> a
    
    here, I'm pretty sure Num a and Show a can be freely reordered
  3. Show and Eq in deriving (Show, Eq). This one seems to have the grammar name of tuple, just like the previous thing, so I wonder whether we could make tuples commutative parents in general? (EDIT: of course not -- for example those that define the return type of a function). Because I'm not sure there is a way in Mergiraf to say "this grammar element is only commutative in this context"
  4. dataclass constructors, e.g. Just a and Nothing in Maybe a = Just a | Nothing
  5. pragmas specified at the top of a file (maybe elsewhere as well?):
    └haskell
      ├pragma {-# LANGUAGE NumericUnderscores #-}
      └pragma {-# OPTIONS_GHC -Wno-unrecognised-pragmas #-}
    
    this will require making haskell a commutative parent restricted to pragma children, see Adding children groups

of course, underspecifying commutative parents is never wrong, so you might be inclined to merge this PR as is and extend the LangProfile as the need arises

here's a couple more candidates for commutativity that come to mind: 1. ~~function definitions~~ the grammar apparently doesn't treat them as a single item, instead creating a new item for both the signature, and each pattern of a function, so something like ```hs len :: [a] -> Int len [] = 0 len (x:xs) = ... ``` becomes 3 separate items, whereas one would wish for them be treated as one. I'm guessing this has to do with the fact that Haskell does allow specifying just the signature of a function, without any patterns, though I don't know why that is (and even the linter warns against it?). Anyway, a bit unfortunate 2. typeclass.. bounds? I don't know what the correct term for this is in Haskell -- I mean the things that are like Rust's trait bounds: ```hs foo :: (Num a, Show a) => a -> a ``` here, I'm pretty sure `Num a` and `Show a` can be freely reordered 3. `Show` and `Eq` in `deriving (Show, Eq)`. This one seems to have the grammar name of `tuple`, just like the previous thing, so I wonder whether we could make `tuple`s commutative parents in general? (EDIT: of course not -- for example those that define the return type of a function). Because I'm not sure there is a way in Mergiraf to say "this grammar element is only commutative in this context" 4. dataclass constructors, e.g. `Just a` and `Nothing` in `Maybe a = Just a | Nothing` 5. pragmas specified at the top of a file (maybe elsewhere as well?): ``` └haskell ├pragma {-# LANGUAGE NumericUnderscores #-} └pragma {-# OPTIONS_GHC -Wno-unrecognised-pragmas #-} ``` this will require making `haskell` a commutative parent restricted to `pragma` children, see [Adding children groups](https://mergiraf.org/adding-a-language.html#adding-children-groups) of course, underspecifying commutative parents is never wrong, so you might be inclined to merge this PR as is and extend the `LangProfile` as the need arises
First-time contributor

Good point! Allow me to just jump in here and heckle as a veteran Haskell user ;)

  1. Unfortunately pattern matches aren't commutative, earlier patterns take precedence over later ones.
  2. Yes, that makes a lot of sense! (We call them "typeclass constraints" 🤷)
  3. Again, this should also work.
  4. The order for constructors subtly matters for derived instances. E.g. in data Bool = True | False deriving (Eq, Ord) we have False > True, but if we were to swap the order, the derived Ord instance would turn around.
  5. Yes, this makes sense.
Good point! Allow me to just jump in here and heckle as a veteran Haskell user ;) 1. Unfortunately pattern matches aren't commutative, earlier patterns take precedence over later ones. 2. Yes, that makes a lot of sense! (We call them "typeclass constraints" 🤷) 3. Again, this should also work. 4. The order for constructors subtly matters for derived instances. E.g. in `data Bool = True | False deriving (Eq, Ord)` we have `False > True`, but if we were to swap the order, the derived `Ord` instance would turn around. 5. Yes, this makes sense.
First-time contributor

Another language feature that should be commutative: Record updates.

foo = bar { field1 = x, field2 = y }
Another language feature that should be commutative: Record updates. ```haskell foo = bar { field1 = x, field2 = y } ```
Owner

Unfortunately pattern matches aren't commutative, earlier patterns take precedence over later ones.

Yes of course, what I meant was having the entire function definitions be commutative among themselves, so have this:

foo :: a -> a
foo = id

len :: [a] -> Int
len [] = 0
len (x:xs) = 1 + len xs

be able to be reordered to this:

len :: [a] -> Int
len [] = 0
len (x:xs) = 1 + len xs

foo :: a -> a
foo = id

The order for constructors subtly matters for derived instances. E.g. in data Bool = True | False deriving (Eq, Ord) we have False > True, but if we were to swap the order, the derived Ord instance would turn around.

Oh, right. We do have a similar thing in Rust as well, but we choose to be lenient there, since otherwise we'd produce a lot of false-negatives.

What we've been wanting for a while now is to have a way to let the user customize the default config, in particular to make the commutativity rules more or less strict.

> Unfortunately pattern matches aren't commutative, earlier patterns take precedence over later ones. Yes of course, what I meant was having the entire function definitions be commutative among themselves, so have this: ```hs foo :: a -> a foo = id len :: [a] -> Int len [] = 0 len (x:xs) = 1 + len xs ``` be able to be reordered to this: ```hs len :: [a] -> Int len [] = 0 len (x:xs) = 1 + len xs foo :: a -> a foo = id ``` > The order for constructors subtly matters for derived instances. E.g. in data Bool = True | False deriving (Eq, Ord) we have False > True, but if we were to swap the order, the derived Ord instance would turn around. Oh, right. We do have a similar thing in Rust as well, but we choose to be lenient there, since otherwise we'd produce a lot of false-negatives. What we've been wanting for a while now is to have a way to let the user customize the default config, in particular to make the commutativity rules more or less strict.
First-time contributor

Ah apologies, yes, that makes sense.

Ah apologies, yes, that makes sense.
First-time contributor

here, I'm pretty sure Num a and Show a can be freely reordered

Extremely nitpicky observation but this is in fact not universally true. The order in which constraints appears affects the order in which type variables are implicitly quantified, which is visible when using type applications.

foo :: (Show a, Num b) => a -> b -> a

let x :: Int = foo @Int 1

won't compile if you swap the constraints

> here, I'm pretty sure Num a and Show a can be freely reordered Extremely nitpicky observation but this is in fact not universally true. The order in which constraints appears affects the order in which type variables are implicitly quantified, which is visible when using type applications. ``` foo :: (Show a, Num b) => a -> b -> a let x :: Int = foo @Int 1 ``` won't compile if you swap the constraints
First-time contributor

You could probably get away with ignoring this in mergiraf though, it maybe doesn't need to be 100%.

You could probably get away with ignoring this in mergiraf though, it maybe doesn't need to be 100%.
Owner

@michaelpj oh, thanks for sharing this! I think it may warrant removing the commutativity actually – let me explain, using Rust as a comparison point:

So from what you say, it looks like Haskell uses typeclass constraints as a way to get the order of not only the constraints themselves, but also the type variables.

In Rust, these are specified separately, and so the function from your example would like something like this:

fn foo<T: Debug, U: Num>(t: T, u: U) -> T {}

with T: Debug being a trait bound (=typeclass constraint).

And if there were more bounds on T, that would've looked like T: Debug + Display. Notice that the +-seprated list of bounds is fully commutative – reordering it doesn't make any observable difference. That's why we have them as commutative in the language profile for Rust.

But the individual type parameters (=type variables) are not commutative, for a reason similar to Haskell – if one wanted to specify T, one would do this:

let a = foo::<u32, _>(1);

which wouldn't work if you reordered T and U in the function signature.

In fact I did at one point open a PR that added commutativity to type parameters (#382), and @wetneb argued it's a bad idea with an example similar to the one above.

So Rust does split these two things, which allows us to make one of them commutative and not the other one. But Haskell doesn't, which leads me to the point that we should in fact disable commutativity for the whole thing (type variables), since one part of them corresponds to something we have disabled commutativity for in Rust.

@michaelpj @wetneb what do you think?

Arguably, if the compact conflict representation is used, understanding the conflict and resolving it manually shouldn't be too difficult. But that's assuming that the user does remember about type applications while resolving, and doesn't think to themselves "this is a trivial conflict, why didn't Mergiraf resolve it for me". One way to deal with that would be to give an explanation (as a log message) when creating such conflicts, but I have no idea how one would go about implementing that...

@michaelpj oh, thanks for sharing this! I think it may warrant removing the commutativity actually – let me explain, using Rust as a comparison point: So from what you say, it looks like Haskell uses typeclass constraints as a way to get the order of not only the constraints themselves, but also the type variables. In Rust, these are specified separately, and so the function from your example would like something like this: ```rs fn foo<T: Debug, U: Num>(t: T, u: U) -> T {} ``` with `T: Debug` being a trait bound (=typeclass constraint). And if there were more bounds on `T`, that would've looked like `T: Debug + Display`. Notice that the +-seprated list of bounds is fully commutative – reordering it doesn't make any observable difference. That's why we have them as commutative in the language profile for Rust. But the individual type parameters (=type variables) are not commutative, for a reason similar to Haskell – if one wanted to specify `T`, one would do this: ```rs let a = foo::<u32, _>(1); ``` which wouldn't work if you reordered `T` and `U` in the function signature. In fact I did at one point open a PR that added commutativity to type parameters (#382), and @wetneb argued it's a bad idea with an example similar to the one above. So Rust does split these two things, which allows us to make one of them commutative and not the other one. But Haskell doesn't, which leads me to the point that we should in fact disable commutativity for the whole thing (type variables), since one part of them corresponds to something we have disabled commutativity for in Rust. @michaelpj @wetneb what do you think? Arguably, if the compact conflict representation is used, understanding the conflict and resolving it manually shouldn't be too difficult. But that's assuming that the user does remember about type applications while resolving, and doesn't think to themselves "this is a trivial conflict, why didn't Mergiraf resolve it for me". One way to deal with that would be to give an explanation (as a log message) when creating such conflicts, but I have no idea how one would go about implementing that...
Author
Contributor

The good news is that I couldn't implement that specific commutativity because the Haskell treesitter grammar is not suited to it. So no need to remove anything. 🥳

The good news is that I couldn't implement that specific commutativity because the Haskell treesitter grammar is not suited to it. So no need to remove anything. 🥳
Owner

hooray.. I guess 😅

hooray.. I guess 😅
Author
Contributor

All excellent ideas. Especially the type class constraints actually frequently produce merge errors in our codebase.

I will try to get to it quickly but imo this PR is self contained and doing the next step in follow up PRs seems like good practice?

I am curious of the function order thing. To correctly deal with this we need to be able to say something like "all functions, bindings and signatures commute" (which we can) "except if two functions have the same base name, then they don’t commute". Can we do the later in mergiraf?

All excellent ideas. Especially the type class constraints actually frequently produce merge errors in our codebase. I will try to get to it quickly but imo this PR is self contained and doing the next step in follow up PRs seems like good practice? I am curious of the function order thing. To correctly deal with this we need to be able to say something like "all functions, bindings and signatures commute" (which we can) "except if two functions have the same base name, then they don’t commute". Can we do the later in mergiraf?
Owner

I will try to get to it quickly but imo this PR is self contained and doing the next step in follow up PRs seems like good practice?

Sure, yes! We're unlikely to make a new release soon, so you should have enough to complete all the follow-up PRs

Can we do the later in mergiraf?

I'm afraid we can't (currently), no. But I still think bundling several nodes together for the purpose of commutative merging could be something we could generalize beyond Haskell. In fact, this kind of reminds of the problem of associating comments/attributes with the nodes they annotate. @wetneb maybe you have some ideas?

> I will try to get to it quickly but imo this PR is self contained and doing the next step in follow up PRs seems like good practice? Sure, yes! We're unlikely to make a new release soon, so you should have enough to complete all the follow-up PRs > Can we do the later in mergiraf? I'm afraid we can't (currently), no. But I still think bundling several nodes together for the purpose of commutative merging could be something we could generalize beyond Haskell. In fact, this kind of reminds of the problem of [associating comments/attributes with the nodes they annotate](https://codeberg.org/mergiraf/mergiraf/issues/265#issuecomment-3586523). @wetneb maybe you have some ideas?
Owner

@ada4a wrote in #429 (comment):

I'm afraid we can't (currently), no. But I still think bundling several nodes together for the purpose of commutative merging could be something we could generalize beyond Haskell. In fact, this kind of reminds of the problem of associating comments/attributes with the nodes they annotate. @wetneb maybe you have some ideas?

Hmm, I'm not sure how this could fall into the same sort of bundling heuristic as the one for comments, it seems quite different to me. But it's useful food for thought!

@ada4a wrote in https://codeberg.org/mergiraf/mergiraf/pulls/429#issuecomment-5006535: > I'm afraid we can't (currently), no. But I still think bundling several nodes together for the purpose of commutative merging could be something we could generalize beyond Haskell. In fact, this kind of reminds of the problem of [associating comments/attributes with the nodes they annotate](https://codeberg.org/mergiraf/mergiraf/issues/265#issuecomment-3586523). @wetneb maybe you have some ideas? Hmm, I'm not sure how this could fall into the same sort of bundling heuristic as the one for comments, it seems quite different to me. But it's useful food for thought!
wetneb approved these changes 2025-06-05 12:51:11 +02:00
wetneb left a comment
Owner

Let's merge this already, more commutativity can be added in follow-up PRs :)

Let's merge this already, more commutativity can be added in follow-up PRs :)
wetneb merged commit 7041d55931 into main 2025-06-05 12:51:25 +02:00
wetneb referenced this pull request from a commit 2025-06-05 12:51:26 +02:00
Owner

@wetneb wrote in #429 (comment):

Hmm, I'm not sure how this could fall into the same sort of bundling heuristic as the one for comments, it seems quite different to me.

What I was thinking is that, just as with atomic nodes we say "treat this node and all its children as a single leaf node", and with the comments we (want to) say "treat this comment node and the node following it as a single leaf node", we could say in Haskell "treat this signature node, and the following pattern nodes, as a single leaf node" – so in general we could have a AST postprocessing stage where we bundle sibling nodes this way.

Of course it would be much nicer if the grammar did this for us, similar to how all the imports are apparently grouped under imports in the Haskell grammar (probably partially because the language requires them all to be at the beginning of the file), but in the absence of that...

@wetneb wrote in https://codeberg.org/mergiraf/mergiraf/pulls/429#issuecomment-5007516: > Hmm, I'm not sure how this could fall into the same sort of bundling heuristic as the one for comments, it seems quite different to me. What I was thinking is that, just as with atomic nodes we say "treat this node and all its children as a single leaf node", and with the comments we (want to) say "treat this comment node and the node following it as a single leaf node", we could say in Haskell "treat this signature node, and the following pattern nodes, as a single leaf node" – so in general we could have a AST postprocessing stage where we bundle sibling nodes this way. Of course it would be much nicer if the grammar did this for us, similar to how all the imports are apparently grouped under `imports` in the Haskell grammar (probably partially because the language requires them all to be at the beginning of the file), but in the absence of that...
Owner

Ok I see, but I think I wouldn't want the comment bundling heuristic to be something that treats the comment and the node it attaches to as a leaf, because that would forbid any conflict resolution in it (such as, in an entire method).

Ok I see, but I think I wouldn't want the comment bundling heuristic to be something that treats the comment and the node it attaches to as a *leaf*, because that would forbid any conflict resolution in it (such as, in an entire method).
Author
Contributor

Sadly I have struggle adding the other features:

Lets look at deriving:

    │ └deriving: deriving
    │   ├deriving
    │   └classes: tuple
    │     ├(
    │     ├name Eq
    │     ├,
    │     ├name Ord
    │     ├,
    │     ├name Show
    │     └)

the CommutativeParent here would be "tuple".
These is the typeclass constraints context:

    │ └type: context
    │   ├context: tuple
    │   │ ├(
    │   │ ├apply
    │   │ │ ├constructor: name Eq
    │   │ │ └argument: variable a
    │   │ ├,
    │   │ ├apply
    │   │ │ ├constructor: name Ord
    │   │ │ └argument: variable a
    │   │ └)
    │   ├arrow: =>

same here.

And in general, tuple elements very much do not commutate.

Any hints as to how I could pin this down? e.g. I would need to match on the field name of the commutative parent?

Sadly I have struggle adding the other features: Lets look at deriving: ``` │ └deriving: deriving │ ├deriving │ └classes: tuple │ ├( │ ├name Eq │ ├, │ ├name Ord │ ├, │ ├name Show │ └) ``` the CommutativeParent here would be "tuple". These is the typeclass constraints context: ``` │ └type: context │ ├context: tuple │ │ ├( │ │ ├apply │ │ │ ├constructor: name Eq │ │ │ └argument: variable a │ │ ├, │ │ ├apply │ │ │ ├constructor: name Ord │ │ │ └argument: variable a │ │ └) │ ├arrow: => ``` same here. And in general, tuple elements very much do not commutate. Any hints as to how I could pin this down? e.g. I would need to match on the field name of the commutative parent?
maralorn deleted branch feat-haskell-support 2025-06-05 15:30:13 +02:00
Owner

Yeah, that's not supported in Mergiraf at the moment.

One way of doing it would be to modify the grammar so that not the general tuple node in this location, but a more specific one. Arguably, this probably more correct, because surely you can't use any sort of tuples there? Something like (1.4, "a") wouldn't make sense, right? In my limited experience, it's difficult to make the case for those sorts of changes in tree-sitter parsers, because they are designed for highlighting, and so people don't care too much about the grammar accepting invalid inputs. But you can still try and propose it, perhaps. If it doesn't get accepted, we can still consider forking it (just like I did recently with the rust grammar, https://codeberg.org/grammar-orchard/tree-sitter-rust-orchard).

Another approach would be to change mergiraf so that it accepts a more complicated selector for commutative parents, but that's quite a lot of effort and I'm not sure it's worth it.

Yeah, that's not supported in Mergiraf at the moment. One way of doing it would be to modify the grammar so that not the general `tuple` node in this location, but a more specific one. Arguably, this probably more correct, because surely you can't use any sort of tuples there? Something like `(1.4, "a")` wouldn't make sense, right? In my limited experience, it's difficult to make the case for those sorts of changes in tree-sitter parsers, because they are designed for highlighting, and so people don't care too much about the grammar accepting invalid inputs. But you can still try and propose it, perhaps. If it doesn't get accepted, we can still consider forking it (just like I did recently with the rust grammar, https://codeberg.org/grammar-orchard/tree-sitter-rust-orchard). Another approach would be to change mergiraf so that it accepts a more complicated selector for commutative parents, but that's quite a lot of effort and I'm not sure it's worth it.
Owner

@wetneb wrote in #429 (comment):

I think I wouldn't want the comment bundling heuristic to be something that treats the comment and the node it attaches to as a leaf, because that would forbid any conflict resolution in it (such as, in an entire method).

Fair enough. We probably want not atomicity, but rather to collect these nodes under a virtual parent node, like this:

function
|- signature
|- pattern
|- pattern
|- ...

PS: I so wish we had comment threads on Codeberg 😅

@wetneb wrote in https://codeberg.org/mergiraf/mergiraf/pulls/429#issuecomment-5008872: > I think I wouldn't want the comment bundling heuristic to be something that treats the comment and the node it attaches to as a _leaf_, because that would forbid any conflict resolution in it (such as, in an entire method). Fair enough. We probably want not atomicity, but rather to collect these nodes under a virtual parent node, like this: ``` function |- signature |- pattern |- pattern |- ... ``` PS: I so wish we had comment threads on Codeberg 😅
Owner

@wetneb wrote in #429 (comment):

One way of doing it would be to modify the grammar so that not the general tuple node in this location, but a more specific one. Arguably, this probably more correct, because surely you can't use any sort of tuples there? Something like (1.4, "a") wouldn't make sense, right?

Funnily enough, the first thing @maralorn added, import_list, is an example of this -- it's basically a tuple, but still gets its separate grammar type.

Speaking of which -- @maralorn, I think you'll want to define signatures for import_names, similarly to record fields (as mentioned at #431 (comment)). Because currently, this:

<<<<<<< LEFT
import Foo(bar,baz)
||||||| BASE
import Foo(bar)
=======
import Foo(baz,bar)
>>>>>>> RIGHT

would get resolved as

import Foo(baz,bar,baz)

since Mergiraf doesn't equate the two bazs

@wetneb wrote in https://codeberg.org/mergiraf/mergiraf/pulls/429#issuecomment-5009193: > One way of doing it would be to modify the grammar so that not the general `tuple` node in this location, but a more specific one. Arguably, this probably more correct, because surely you can't use any sort of tuples there? Something like `(1.4, "a")` wouldn't make sense, right? Funnily enough, the first thing @maralorn added, `import_list`, is an example of this -- it's basically a tuple, but still gets its separate grammar type. Speaking of which -- @maralorn, I think you'll want to define signatures for `import_name`s, similarly to record fields (as mentioned at https://codeberg.org/mergiraf/mergiraf/pulls/431#issuecomment-5009421). Because currently, this: ```hs <<<<<<< LEFT import Foo(bar,baz) ||||||| BASE import Foo(bar) ======= import Foo(baz,bar) >>>>>>> RIGHT ``` would get resolved as ```hs import Foo(baz,bar,baz) ``` since Mergiraf doesn't equate the two `baz`s
Author
Contributor

Sadly it is more complicated than that.
We could have

<<<<<<< LEFT
import Foo(foo,Bar(baz))
||||||| BASE
import Foo(foo)
=======
import Foo(Bar(bar),foo)
>>>>>>> RIGHT

which should resolve to

import Foo(Bar(bar,baz),foo)

I am not sure if I can manage to find a signature for that … I will have a look at which options we have for signatures. At least the simple case of deduplicating names should be possible.

At least double imports are not an error but just a warning in Haskell and can get cleaned by a formatter.

Sadly it is more complicated than that. We could have ``` <<<<<<< LEFT import Foo(foo,Bar(baz)) ||||||| BASE import Foo(foo) ======= import Foo(Bar(bar),foo) >>>>>>> RIGHT ``` which should resolve to ``` import Foo(Bar(bar,baz),foo) ``` I am not sure if I can manage to find a signature for that … I will have a look at which options we have for signatures. At least the simple case of deduplicating names should be possible. At least double imports are not an error but just a warning in Haskell and can get cleaned by a formatter.
Owner

The easiest is just to use the whole child as signature, which will ensure you don't end up with two identical imports at the same level.

So far, Mergiraf will only generate something like import Foo(Bar(bar),Foo(baz),foo), but perhaps your linter can normalize that into import Foo(Bar(bar,baz),foo).

The easiest is just to use the whole child as signature, which will ensure you don't end up with two identical imports at the same level. So far, Mergiraf will only generate something like `import Foo(Bar(bar),Foo(baz),foo)`, but perhaps your linter can normalize that into `import Foo(Bar(bar,baz),foo)`.
Author
Contributor

I added all the mentioned signatures to the PR in #431.

I added all the mentioned signatures to the PR in #431.
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: mergiraf/mergiraf#429
No description provided.