Roadmap?

wetneb commented

2025-02-10 11:36:13 +01:00

Owner

I wonder if it would be worth writing up a sort of roadmap. This is a recommendation I have often heard for FOSS projects, and I am interested in the exercise. It often felt difficult for me because I didn't want to constrain the contributions of others into a predefined plan, but I have also heard it from others that it actually helps them to understand the direction a FOSS project is heading towards, to help them gauge whether they want to get involved or not, and how they can help.

Would this be something you'd be interested in, @ada4a @funkeleinhorn @zivarah? (and others!)

We could ask ourselves questions such as:

What do you think should be worked on as a priority on this project? (not necessarily done by you)
What are you interested in working on? (not necessarily high-priority tasks)
Mergiraf 1.0 - what would that look like, what is missing to get there?

Let's see if we have things to say on those topics, and whether we want to write something up somewhere.

I wonder if it would be worth writing up a sort of roadmap. This is a recommendation I have often heard for FOSS projects, and I am interested in the exercise. It often felt difficult for me because I didn't want to constrain the contributions of others into a predefined plan, but I have also heard it from others that it actually helps them to understand the direction a FOSS project is heading towards, to help them gauge whether they want to get involved or not, and how they can help. Would this be something you'd be interested in, @ada4a @funkeleinhorn @zivarah? (and others!) We could ask ourselves questions such as: * What do you think should be worked on as a priority on this project? (not necessarily done by you) * What are you interested in working on? (not necessarily high-priority tasks) * Mergiraf 1.0 - what would that look like, what is missing to get there? Let's see if we have things to say on those topics, and whether we want to write something up somewhere.

wetneb added the

Kind

Documentation

label

2025-02-10 11:37:45 +01:00

ada4a commented

2025-02-10 17:09:01 +01:00

Owner

I think this is a good idea in general. But doing this can be somewhat tricky for Mergiraf, since, as a merge driver, it's almost by definition supposed to get out of your way during normal operation, and if you do find yourself interacting with, then it's because of some kind of a mistake it made (e.g. false-positive/negative merge results). But I guess that's true for most utility programs...

Anyway, here's what my answers to the questions would be:

What do you think should be worked on as a priority on this project?

I began writing an answer, then realized the paragraph has become kind of big and I could just turn it into an issue... so I did just that: #198
Setting up good benchmarks for identifying actual performance choke points. I've been mostly following my gut feeling during the numerous PRs, but a) I think I've fixed pretty much all of the obvious stuff, and b) it would be great to have something concrete to base the work on. #39 and #40 are good examples of what I'd love to see more of.

What are you interested in working on?

I'm not sure I understand the question completely. You mean us, the maintainers? Not sure that that'd be too interesting for people reading through the roadmap 😅

What I've been enjoying lately is investigating some of the buggy merges by following the logic flow across the functions -- it feels like a small adventure! Maybe it would make sense to compile some Git repositories and analyze some of the merge commits -- I guess you already set out to that in #197.

Mergiraf 1.0 - what would that look like, what is missing to get there?

You mentioned a couple of times that you'd like to keep false-positives to a minimum, even at the cost of false-negatives. A lot of official Rust tooling follows this strategy as well, and I think that's very important indeed. From my limited experience with Mergiraf 0.5.0, #89 seems to have been a big step in that direction. All in all, I think might be the most important thing to focus on.

I'd cautiously say that everything else seems to be quite good already?

I think this is a good idea in general. But doing this can be somewhat tricky for Mergiraf, since, as a merge driver, it's almost by definition supposed to get out of your way during normal operation, and if you do find yourself interacting with, then it's because of some kind of a mistake it made (e.g. false-positive/negative merge results). But I guess that's true for most utility programs... Anyway, here's what my answers to the questions would be: ## What do you think should be worked on as a priority on this project? - I began writing an answer, then realized the paragraph has become kind of big and I could just turn it into an issue... so I did just that: #198 - Setting up good benchmarks for identifying actual performance choke points. I've been mostly following my gut feeling during the numerous PRs, but a) I think I've fixed pretty much all of the obvious stuff, and b) it would be great to have something concrete to base the work on. #39 and #40 are good examples of what I'd love to see more of. ## What are you interested in working on? I'm not sure I understand the question completely. You mean us, the maintainers? Not sure that that'd be too interesting for people reading through the roadmap 😅 What I've been enjoying lately is investigating some of the buggy merges by following the logic flow across the functions -- it feels like a small adventure! Maybe it would make sense to compile some Git repositories and analyze some of the merge commits -- I guess you already set out to that in #197. ## Mergiraf 1.0 - what would that look like, what is missing to get there? You mentioned a couple of times that you'd like to keep false-positives to a minimum, even at the cost of false-negatives. A lot of official Rust tooling follows this strategy as well, and I think that's very important indeed. From my limited experience with Mergiraf 0.5.0, #89 seems to have been a big step in that direction. All in all, I think might be the most important thing to focus on. I'd cautiously say that everything else seems to be quite good already?

👍 1

wetneb commented

2025-02-10 18:41:55 +01:00

Author

Owner

Thanks, that's super insightful!

What are you interested in working on?

I'm not sure I understand the question completely. You mean us, the maintainers? Not sure that that'd be too interesting for people reading through the roadmap 😅

Why wouldn't that be interesting? As a potential contributor, I would find it useful to know what people intend to work on, so I can focus on other things. As a user, I'd be interested to know what direction is given to the project by the people "in charge"… realistically, the stuff we are motivated to work on ourselves is a better representation of the future of the tool than what we think should be the priority ^^

For instance, the Gitea project does their "roadmap" like this, only letting people announce what they intend to work on themselves.

Thanks, that's super insightful! > ### What are you interested in working on? > I'm not sure I understand the question completely. You mean us, the maintainers? Not sure that that'd be too interesting for people reading through the roadmap 😅 Why wouldn't that be interesting? As a potential contributor, I would find it useful to know what people intend to work on, so I can focus on other things. As a user, I'd be interested to know what direction is given to the project by the people "in charge"… realistically, the stuff we are motivated to work on ourselves is a better representation of the future of the tool than what we think should be the priority ^^ For instance, the Gitea project does their "roadmap" like this, [only letting people announce what they intend to work on themselves](https://github.com/go-gitea/gitea/issues/32877).

ada4a commented

2025-02-10 18:55:29 +01:00

Owner

Thanks, that's super insightful!

Glad to head that:)

Why wouldn't that be interesting? As a potential contributor, I would find it useful to know what people intend to work on, so I can focus on other things. As a user, I'd be interested to know what direction is given to the project by the people "in charge"… realistically, the stuff we are motivated to work on ourselves is a better representation of the future of the tool than what we think should be the priority ^^

For instance, the Gitea project does their "roadmap" like this, only letting people announce what they intend to work on themselves.

I've just never seen such a thing before to be honest; but I see your point, and the example of Gitea is helpful as well. So, sure!

> Thanks, that's super insightful! Glad to head that:) > Why wouldn't that be interesting? As a potential contributor, I would find it useful to know what people intend to work on, so I can focus on other things. As a user, I'd be interested to know what direction is given to the project by the people "in charge"… realistically, the stuff we are motivated to work on ourselves is a better representation of the future of the tool than what we think should be the priority ^^ > > For instance, the Gitea project does their "roadmap" like this, only letting people announce what they intend to work on themselves. I've just never seen such a thing before to be honest; but I see your point, and the example of Gitea is helpful as well. So, sure!

wetneb commented

2025-02-15 14:47:31 +01:00

Author

Owner

So let's try and answer my own questions too :)

What do you think should be worked on as a priority on this project? (not necessarily done by you)

reviewing mergiraf's output on more real world cases of merges and fixing issues that arise. I've done a fair bit of that in the initial development, based on Spork's own evaluation test suite, but one could spend a lot more time doing that, it seemingly never ends! The AST merging evaluation campaign (which now also includes Mergiraf 0.3) is another good source of interesting real-world use cases. Those large-scale evaluation campaigns are also useful to detect panics, timeouts and other catastrophic failures.
it would be great to have an automated release process (#144) - I see it as an important way to enable more people to get involved in the project, and not hoard the responsibility to publish releases myself

What are you interested in working on? (not necessarily high-priority tasks)

improving the Rust grammar we rely on, to help with commutative merging of methods despite the presence of attributes or documentation comments… sadly the tree-sitter-rust grammar does not seem to be actively accepting external contributions, so it's a bit blocked by that. Forking doesn't seem that bad though.
I would find it amazing to have Mergiraf packaged on Debian-based systems. I've started a small hobby of packaging the crates we depend on, one after the other, so that in a few years we can hope to have Mergiraf packaged too (it takes a long time because uploading the packages relies on finding Debian developers available to sponsor). The biggest hurdle is packaging tree-sitter and the parsers, but there is independent interest in getting them packaged (for Neovim), so I am not alone there ^^

Mergiraf 1.0 - what would that look like, what is missing to get there?

I would find it useful to have a bit more of a principled approach to decide which languages/formats we should support. Ideally, we'd try to find a list of the most common formats and try to ensure that we support the top ones. It would also be good to keep an eye on the size of the resulting binary and gather a better understanding of the size costs of each language (maybe there are also ways to keep that down).

It could also be useful to do a dedicated evaluation campaign and benchmark, hopefully with conclusive results in favour of Mergiraf, to have the assurance that it's a reasonable thing to advertise to people.

So let's try and answer my own questions too :) ## What do you think should be worked on as a priority on this project? (not necessarily done by you) - reviewing mergiraf's output on more real world cases of merges and fixing issues that arise. I've done a fair bit of that in the initial development, based on Spork's own evaluation test suite, but one could spend a lot more time doing that, it seemingly never ends! The [AST merging evaluation campaign](https://github.com/benedikt-schesch/AST-Merging-Evaluation) (which now also includes Mergiraf 0.3) is another good source of interesting real-world use cases. Those large-scale evaluation campaigns are also useful to detect panics, timeouts and other catastrophic failures. - it would be great to have an automated release process (#144) - I see it as an important way to enable more people to get involved in the project, and not hoard the responsibility to publish releases myself ## What are you interested in working on? (not necessarily high-priority tasks) - improving the Rust grammar we rely on, to help with commutative merging of methods despite the presence of attributes or documentation comments… sadly the tree-sitter-rust grammar does not seem to be actively accepting external contributions, so it's a bit blocked by that. Forking doesn't seem that bad though. - I would find it amazing to have Mergiraf packaged on Debian-based systems. I've started a small hobby of packaging the crates we depend on, one after the other, so that in a few years we can hope to have Mergiraf packaged too (it takes a long time because uploading the packages relies on finding Debian developers available to sponsor). The biggest hurdle is packaging tree-sitter and the parsers, but there is independent interest in getting them packaged (for Neovim), so I am not alone there ^^ ## Mergiraf 1.0 - what would that look like, what is missing to get there? I would find it useful to have a bit more of a principled approach to decide which languages/formats we should support. Ideally, we'd try to find a list of the most common formats and try to ensure that we support the top ones. It would also be good to keep an eye on the size of the resulting binary and gather a better understanding of the size costs of each language (maybe there are also ways to keep that down). It could also be useful to do a dedicated evaluation campaign and benchmark, hopefully with conclusive results in favour of Mergiraf, to have the assurance that it's a reasonable thing to advertise to people.

👍 1

zivarah commented

2025-02-16 22:17:08 +01:00

Member

My use of mergiraf is fairly limited still as I haven't been particularly active on non-work development recently, and my employer has a strict approval process for external software so I haven't been able to use it there yet either. However, that approval finally came through, so I expect to get a chance to see mergiraf in action more in the coming weeks and hopefully provide more meaningful thoughts on the future direction.

I do have a few thoughts which I will go over here acknowledging that I am completely ignoring any technical barriers: I don't have the technical understanding of mergiraf to really understand the feasibility of these, they are just some of the things I've noticed as I've incorporated mergiraf into my system.

With each, I try to play devil's advocate a little bit and include reasons why we might not want to pursue that idea: some cans of worms are best left unopened! Please feel free to be very frank in telling me if these are bad ideas, I will not be offended =)

I can spin up issues for any of them that do seem worth pursuing.

Non-packaged language support

While mergiraf seems to be a fairly generic/well abstracted tool that doesn't require significant changes to support a new language, there are still a number of hurdles:

There needs to be a crate for that language's parser
That crate needs to work with the specific version of the core treesitter crate that mergiraf uses
A PR needs to be opened to add official support
There is some amount of delay while that change is reviewed, incorporated, eventually makes it into a release, and finally that release becomes available in the user's package manager of choice

Additionally, while many popular languages have been added, the number of programming languages out there is enormous and adding built-in/packaged/compiled support for every one of them is a tall order.

It seems like there could be some value in making it possible for a user to acquire a parser via some external means, and teach mergiraf how to use it with some config file: essentially a LangProfile that can be parsed at runtime. It seems this could open the door for community-maintained "plugins" per se, where the relevant config is maintained by experts in that domain and can be tweaked by a user in real time if desired (for example, to take on less risk by disallowing commutativity).

My total lack of understanding of how tree-sitter grammars work is likely on full display here: I don't know if there's any precedent for invoking a parser via an external command and getting json output on stdout or anything like that.

Reasons not to do it

Performance: presumably this would have to be a slower as it would involve out-of-process work
Technical complexity/maintenance overhead
Pressure on mergiraf to support (e.g. investigate issues with) community-owned configurations that we don't own

Repo-specific behavioral tweaks

Building on this thought from the section above:

[...] can be tweaked by a user in real time if desired (for example, to take on less risk by disallowing commutativity)

The main use case I'm imagining here is commutativity, though there could be others.

There are times when even though a parent's children are commutative as far as the language is concerned, a certain project may have strict standards regarding how those children should be ordered. For example, ensuring that set of children are sorted alphabetically.

These rules cannot be hard-coded into mergiraf's LangProfile as they would differ between projects even within the same language.

Reasons not to do it

Perhaps better solved with post-merge tooling, e.g. running a formatter that sorts C# using statements automatically
Easier for a user to configure their way into bad behavior accidentally
Harder to track down bugs if user fails to provide their complete config hierarchy when reporting issues

Static configuration

When taking a new version of mergiraf, you need to remember to re-run mergiraf languages --gitattributes and update your git config to include any new languages that you want support for. This is not really that hard, but it might be nice if it was possible to set up your git attributes to use mergiraf for all file types, and have mergiraf re-route to the built-in git handling for file types that it is unable to handle. This would make your git config/gitattributes setup a one-time thing.

Reasons not to do it

Performance: this pass-through would presumably be slower than git just evaluating the attributes and deciding to handle it itself
Not problematic enough to warrant any additional complexity
It's possible that you can already do this with existing git functionality (e.g. this StackOverflow post). If so, we could decide between making the configuration easier by supporting it in mergiraf or just documenting that possibility in the setup instructions.

My use of mergiraf is fairly limited still as I haven't been particularly active on non-work development recently, and my employer has a strict approval process for external software so I haven't been able to use it there yet either. However, that approval finally came through, so I expect to get a chance to see mergiraf in action more in the coming weeks and hopefully provide more meaningful thoughts on the future direction. I do have a few thoughts which I will go over here acknowledging that I am _completely_ ignoring any technical barriers: I don't have the technical understanding of mergiraf to really understand the feasibility of these, they are just some of the things I've noticed as I've incorporated mergiraf into my system. With each, I try to play devil's advocate a little bit and include reasons why we might _not_ want to pursue that idea: some cans of worms are best left unopened! Please feel free to be very frank in telling me if these are bad ideas, I will not be offended =) I can spin up issues for any of them that do seem worth pursuing. ### Non-packaged language support While mergiraf seems to be a fairly generic/well abstracted tool that doesn't require significant changes to support a new language, there are still a number of hurdles: - There needs to be a crate for that language's parser - That crate needs to work with the specific version of the core treesitter crate that mergiraf uses - A PR needs to be opened to add official support - There is some amount of delay while that change is reviewed, incorporated, eventually makes it into a release, and finally that release becomes available in the user's package manager of choice Additionally, while many popular languages have been added, the number of programming languages out there is enormous and adding built-in/packaged/compiled support for every one of them is a tall order. It seems like there could be some value in making it possible for a user to acquire a parser via some external means, and teach mergiraf how to use it with some config file: essentially a `LangProfile` that can be parsed at runtime. It seems this could open the door for community-maintained "plugins" per se, where the relevant config is maintained by experts in that domain and can be tweaked by a user in real time if desired (for example, to take on less risk by disallowing commutativity). My total lack of understanding of how tree-sitter grammars work is likely on full display here: I don't know if there's any precedent for invoking a parser via an external command and getting json output on stdout or anything like that. #### Reasons _not_ to do it - Performance: presumably this would have to be a slower as it would involve out-of-process work - Technical complexity/maintenance overhead - Pressure on mergiraf to support (e.g. investigate issues with) community-owned configurations that we don't own ### Repo-specific behavioral tweaks Building on this thought from the section above: > [...] can be tweaked by a user in real time if desired (for example, to take on less risk by disallowing commutativity) The main use case I'm imagining here is commutativity, though there could be others. There are times when even though a parent's children are commutative as far as the language is concerned, a certain project may have strict standards regarding how those children should be ordered. For example, ensuring that set of children are sorted alphabetically. These rules cannot be hard-coded into mergiraf's `LangProfile` as they would differ between projects even within the same language. #### Reasons _not_ to do it - Perhaps better solved with post-merge tooling, e.g. running a formatter that sorts C# `using` statements automatically - Easier for a user to configure their way into bad behavior accidentally - Harder to track down bugs if user fails to provide their complete config hierarchy when reporting issues ### Static configuration When taking a new version of mergiraf, you need to remember to re-run `mergiraf languages --gitattributes` and update your git config to include any new languages that you want support for. This is not really that hard, but it might be nice if it was possible to set up your git attributes to use mergiraf for all file types, and have mergiraf re-route to the built-in git handling for file types that it is unable to handle. This would make your git config/gitattributes setup a one-time thing. #### Reasons _not_ to do it - Performance: this pass-through would presumably be slower than git just evaluating the attributes and deciding to handle it itself - Not problematic enough to warrant any additional complexity - It's possible that you can already do this with existing git functionality (e.g. [this StackOverflow post](https://stackoverflow.com/questions/47537156/git-custom-merge-driver-with-fallback-to-the-built-in-driver)). If so, we could decide between making the configuration easier by supporting it in mergiraf or just documenting that possibility in the setup instructions.

❤️ 1

ada4a commented

2025-02-21 14:14:40 +01:00

Owner

@zivarah wrote in #196 (comment):

I do have a few thoughts which I will go over here acknowledging that I am completely ignoring any technical barriers: I don't have the technical understanding of mergiraf to really understand the feasibility of these, they are just some of the things I've noticed as I've incorporated mergiraf into my system.

I think this can be beneficial sometimes! Not knowing the constraints can help to think outside the box:)

Non-packaged language support

I agree with this one very much! Unfortunately, I'm not very knowledgeable about tree-sitter grammars either, but surely we're not the first project to think about this?

An obvious example that comes to mind is nvim-treesitter -- we could try looking at, and learning from, their approach to this problem. Though from a quick glance, it seems like they just have a maintainer per language (/ a set of languages?), who are all knowledgeable about both Neovim and Tree-sitter. Not palatable for us I fear.

Repo-specific behavioral tweaks

Yeah I fear this one might be out of scope... a formatter/linter is much better suited for this I think. I've opened an issue along similar lines myself (#198), and I think what we really need is to make Mergiraf cooperate better with formatters. Though as I mentioned in the issue, this is really more of a problem in rebase-based workflows (such as mine) -- when you do a merge, you can just run the formatter once after you're done with (eventual) conflicts.

Static configuration

When taking a new version of mergiraf, you need to remember to re-run mergiraf languages --gitattributes and update your git config to include any new languages that you want support for. This is not really that hard,

We actually have an issue for this already (#57)!

but it might be nice if it was possible to set up your git attributes to use mergiraf for all file types, and have mergiraf re-route to the built-in git handling for file types that it is unable to handle.

@wetneb has expressed the wish to make Mergiraf able to fallback on line-based merge when the language couldn't be detected, effectively acting as Git's default line-based merge machinery. And from what I could tell, this already... just works? This would also kind of solve the previous problem -- one would be able to just specify * merge=mergiraf.

It's possible that you can already do this with existing git functionality (e.g. this StackOverflow post). If so, we could decide between making the configuration easier by supporting it in mergiraf or just documenting that possibility in the setup instructions.

This is a nice trick btw! But hopefully we won't need to resort to it.

@zivarah wrote in https://codeberg.org/mergiraf/mergiraf/issues/196#issuecomment-2833163: > I do have a few thoughts which I will go over here acknowledging that I am _completely_ ignoring any technical barriers: I don't have the technical understanding of mergiraf to really understand the feasibility of these, they are just some of the things I've noticed as I've incorporated mergiraf into my system. I think this can be beneficial sometimes! Not knowing the constraints can help to think outside the box:) > ### [](#non-packaged-language-support)Non-packaged language support I agree with this one very much! Unfortunately, I'm not very knowledgeable about tree-sitter grammars either, but surely we're not the first project to think about this? An obvious example that comes to mind is [nvim-treesitter](https://github.com/nvim-treesitter/nvim-treesitter) -- we could try looking at, and learning from, their approach to this problem. Though from a quick glance, it seems like they just have a maintainer per language (/ a set of languages?), who are all knowledgeable about both Neovim and Tree-sitter. Not palatable for us I fear. > ### [](#repo-specific-behavioral-tweaks)Repo-specific behavioral tweaks Yeah I fear this one might be out of scope... a formatter/linter is much better suited for this I think. I've opened an issue along similar lines myself (#198), and I think what we really need is to make Mergiraf cooperate better with formatters. Though as I mentioned in the issue, this is really more of a problem in rebase-based workflows (such as mine) -- when you do a merge, you can just run the formatter once after you're done with (eventual) conflicts. > ### [](#static-configuration)Static configuration > > When taking a new version of mergiraf, you need to remember to re-run `mergiraf languages --gitattributes` and update your git config to include any new languages that you want support for. This is not really that hard, We actually have an issue for this already (#57)! > but it might be nice if it was possible to set up your git attributes to use mergiraf for all file types, and have mergiraf re-route to the built-in git handling for file types that it is unable to handle. @wetneb has expressed the wish to make Mergiraf able to fallback on line-based merge when the language couldn't be detected, effectively acting as Git's default line-based merge machinery. And from what I could tell, this already... just works? This would also kind of solve the previous problem -- one would be able to just specify `* merge=mergiraf`. > It's possible that you can already do this with existing git functionality (e.g. [this StackOverflow post](https://stackoverflow.com/questions/47537156/git-custom-merge-driver-with-fallback-to-the-built-in-driver)). If so, we could decide between making the configuration easier by supporting it in mergiraf or just documenting that possibility in the setup instructions. This is a nice trick btw! But hopefully we won't need to resort to it.

wetneb commented

2025-02-21 18:41:04 +01:00

Author

Owner

@wetneb has expressed the wish to make Mergiraf able to fallback on line-based merge when the language couldn't be detected, effectively acting as Git's default line-based merge machinery. And from what I could tell, this already... just works?

Yes that has been working from the start.

This would also kind of solve the previous problem -- one would be able to just specify * merge=mergiraf.

The only thing that's holding me back from recommending this is figuring out what happens for binary files. Would this invoke mergiraf on files that git detects as binary? And if it mistakenly detects a binary file as a text file and passes it to mergiraf, what happens?
Another thing to consider is what is the overhead to spawn mergiraf to do a line-based merge, compared to doing it inside git directly. Last time I checked, mergiraf was still noticeably slower than git merge-file, and that's without taking the overhead of spawning a process into account. But a lot of optimizations have been done since.

> @wetneb has expressed the wish to make Mergiraf able to fallback on line-based merge when the language couldn't be detected, effectively acting as Git's default line-based merge machinery. And from what I could tell, this already... just works? Yes that has been working from the start. > This would also kind of solve the previous problem -- one would be able to just specify `* merge=mergiraf`. The only thing that's holding me back from recommending this is figuring out what happens for binary files. Would this invoke mergiraf on files that git detects as binary? And if it mistakenly detects a binary file as a text file and passes it to mergiraf, what happens? Another thing to consider is what is the overhead to spawn mergiraf to do a line-based merge, compared to doing it inside git directly. Last time I checked, mergiraf was still noticeably slower than `git merge-file`, and that's without taking the overhead of spawning a process into account. But a lot of optimizations have been done since.

ada4a commented

2025-02-21 19:05:47 +01:00

Owner

@wetneb wrote in #196 (comment):

The only thing that's holding me back from recommending this is figuring out what happens for binary files. Would this invoke mergiraf on files that git detects as binary? And if it mistakenly detects a binary file as a text file and passes it to mergiraf, what happens?

The documentations seems to suggest that diff/merge logic is skipped completely for binary files¹:

This will cause Git to generate Binary files differ (or a binary patch, if binary patches are enabled) instead of a regular diff.

But I guess we could take a look at the source to be absolutely sure...

Another thing to consider is what is the overhead to spawn mergiraf to do a line-based merge, compared to doing it inside git directly. Last time I checked, mergiraf was still noticeably slower than git merge-file, and that's without taking the overhead of spawning a process into account. But a lot of optimizations have been done since.

I'd imagine this is mostly a non-issue now -- ever since #135, if we don't recognize the language, we bail out as early as at line_merge_and_structured_resolution, which is the entry point to lib.rs for mergiraf merge. And we don't even spawn a Git process in that case -- we use diffy-imara (the diffing library) instead.

https://git-scm.com/docs/gitattributes#_marking_files_as_binary ↩︎

@wetneb wrote in https://codeberg.org/mergiraf/mergiraf/issues/196#issuecomment-2863928: > The only thing that's holding me back from recommending this is figuring out what happens for binary files. Would this invoke mergiraf on files that git detects as binary? And if it mistakenly detects a binary file as a text file and passes it to mergiraf, what happens? The documentations seems to suggest that diff/merge logic is skipped completely for binary files[^1]: [^1]: https://git-scm.com/docs/gitattributes#_marking_files_as_binary > This will cause Git to generate Binary files differ (or a binary patch, if binary patches are enabled) instead of a regular diff. But I guess we could take a look at the source to be absolutely sure... > Another thing to consider is what is the overhead to spawn mergiraf to do a line-based merge, compared to doing it inside git directly. Last time I checked, mergiraf was still noticeably slower than `git merge-file`, and that's without taking the overhead of spawning a process into account. But a lot of optimizations have been done since. I'd imagine this is mostly a non-issue now -- ever since #135, if we don't recognize the language, we bail out as early as at `line_merge_and_structured_resolution`, which is the entry point to `lib.rs` for `mergiraf merge`. And we don't even spawn a Git process in that case -- we use `diffy-imara` (the diffing library) instead.

wetneb commented

2025-02-21 19:17:56 +01:00

Author

Owner

And we don't even spawn a Git process in that case -- we use diffy-imara (the diffing library) instead.

By "spawning a process" I mean the fact that git needs to spawn mergiraf itself, compared to doing the merging in the same process. Beyond the spawning of mergiraf, it also needs to create the temporary files on which mergiraf gets to run (but yeah, it's definitely good that mergiraf doesn't spawn further processes itself).

> And we don't even spawn a Git process in that case -- we use diffy-imara (the diffing library) instead. By "spawning a process" I mean the fact that git needs to spawn mergiraf itself, compared to doing the merging in the same process. Beyond the spawning of mergiraf, it also needs to create the temporary files on which mergiraf gets to run (but yeah, it's definitely good that mergiraf doesn't spawn further processes itself).

ada4a commented

2025-02-21 19:28:21 +01:00

Owner

@wetneb wrote in #196 (comment):

And we don't even spawn a Git process in that case -- we use diffy-imara (the diffing library) instead.

By "spawning a process" I mean the fact that git needs to spawn mergiraf itself, compared to doing the merging in the same process.

Oh, I see. But since git merge-file is a shell script anyway, I think it spawns enough processes on its own.

Beyond the spawning of mergiraf, it also needs to create the temporary files on which mergiraf gets to run (but yeah, it's definitely good that mergiraf doesn't spawn further processes itself).

To be fair, it does need to create them either way (see source).

So I'd say we're pretty safe performance-wise as long as we don't start with the structured stuff.

@wetneb wrote in https://codeberg.org/mergiraf/mergiraf/issues/196#issuecomment-2863998: > > And we don't even spawn a Git process in that case -- we use diffy-imara (the diffing library) instead. > > By "spawning a process" I mean the fact that git needs to spawn mergiraf itself, compared to doing the merging in the same process. Oh, I see. But since `git merge-file` is a shell script anyway, I think it spawns enough processes on its own. > Beyond the spawning of mergiraf, it also needs to create the temporary files on which mergiraf gets to run (but yeah, it's definitely good that mergiraf doesn't spawn further processes itself). To be fair, it does need to create them either way (see [source](https://github.com/git/git/blob/e2067b49ecaef9b7f51a17ce251f9207f72ef52d/git-mergetool.sh#L230-L240)). So I'd say we're pretty safe performance-wise as long as we don't start with the structured stuff.

wetneb commented

2025-02-21 19:31:33 +01:00

Author

Owner

I think the bash script you have in mind is git mergetool, but what I'm referring to is the merging that is happening in git merge / rebase / …, which I don't think is a shell script. I think by default, the git binary does that merging itself.

I think the bash script you have in mind is `git mergetool`, but what I'm referring to is the merging that is happening in `git merge` / `rebase` / …, which I don't think is a shell script. I think by default, the `git` binary does that merging itself.

ada4a commented

2025-02-21 19:40:14 +01:00

Owner

Oh, you're totally right! Sorry for the confusion...

wetneb referenced this issue

2025-03-07 14:39:44 +01:00

Way to add commutative parent at runtime? #249

wetneb pinned this

2025-04-04 23:44:26 +02:00

wetneb referenced this issue

2025-04-28 18:18:13 +02:00

Plug-in system for language profiles #357