[go: up one dir, main page]

|
|
Log in / Subscribe / Register

Avoiding the merge trap

By Jonathan Corbet
April 17, 2023
The kernel subsystem maintainers out there probably have a deep understanding of the sinking feeling that results from opening one's inbox and seeing a response from Linus Torvalds to a pull request. When all goes well, pull requests are acted upon silently; a response usually means that all has not gone well. Several maintainers got to experience that feeling during the 6.3 merge window, which seemed to generate more than the usual number of grumpy responses related to merge commits. Avoiding that situation is not hard, though, with a bit of attention paid to how merges are done.

When using a distributed system like Git, development is done in numerous parallel tracks, each of which has its own starting point. Even if a particular project starts at the tip of the mainline tree, the mainline itself is almost certain to have moved on by the time that work is ready to land there. Bringing independent lines of development back together is called "merging"; depending on what has changed, any given merge can be simple or a nasty mess of conflicting changes.

There are many projects out there that disallow merges entirely, insisting that their development repository consist of a single sequence of commits. In such projects, developers must rebase their work before proposing it for upstream. The kernel project, though, has no problem with merges; indeed, the development process would not work without them. Consider that Torvalds did 208 pulls during the 6.3 merge window, each of which added commits to the mainline. If each pull request had to be rebased each time Torvalds did a pull to avoid a merge, little actual work would get done. Instead, almost every pull into the mainline results in another merge.

Subsystem maintainers often do merges of their own, creating merge commits that end up being pushed into the mainline with the rest of their work. That is all normal, but it is those merge commits that tend to land maintainers in trouble. Why is it that Torvalds can create hundreds of merges during a typical development cycle, but subsystem maintainers get grumbled at?

There are two things to watch out for when creating a merge in a subsystem tree. The first is that Torvalds insists that each merge be properly explained. Merges are commits too, and their changelog should say why the merge is being done. It is easy to just accept the default message that Git creates when adding a merge commit, but the resulting commit will have no useful explanation, which is the equivalent of waving a large red flag for Torvalds to see. Unexplained merges thus have a high likelihood of generating one of those unwanted replies.

The other hazard is related, but arguably more subtle. Torvalds insists on an explanation of each merge because, it seems, he feels that many merges done by subsystem maintainers are unnecessary. "Back merges", where the current state of the mainline is merged back into a subsystem tree, come under extra scrutiny. Such merges clutter the development history and should be avoided if they are not needed. One way to determine whether a merge is needed is to explain the need for it; if the maintainer cannot write a changelog making the need for the merge clear, then perhaps the merge should not be done at all.

Merges into subsystem trees are done for a number of reasons. These can include merging a topic branch that is ready to head upstream or to bring in work from another developer or maintainer. The need for the merge is clear in this case, and the form that the changelog describing it should take is also clear. Most of the merges done by Torvalds himself are of this type, and each one carries a changelog describing the new functionality that the merge brings. Your editor, who has made a habit of following traffic into the mainline for rather too many years at this point, can attest that the quality of those merge messages has increased considerably over the years.

Almost any other merge is meant to bring in changes that take a different path into the mainline; back-merging from the mainline itself is the clearest example of that. Maintainers often manage two branches for work heading upstream, one for urgent fixes and one for larger work waiting for the next merge window. It is quite common to see, at some point, the fixes branch merged into the development branch, even though the fixes have likely already been sent to Torvalds separately. This is the kind of merge that Torvalds tends to question, though.

This email describes some of his thinking with regard to this sort of internal merge. One should not, he said, merge another branch just because it seems that some other work is going to need the changes found there. Instead, that other work should just be based on the fixes in the first place:

Because the "nice git way" to do that kind of thing is to actually realize "oh, I'm starting new work that depends on the fixes I already sent upstream, so I should just make a new topic branch and start at that point that I needed".

And then - once you've done all the "new work" that depended on that state, only at *THAT* point do you merge the topic branch.

And look - you have exactly the same commits: you have one (or more) normal commits that implement the new feature, and you have one merge commit, but notice how much easier it is to write the explanation for the merge when you do it *after* the work.

As is so often the case, there is no hard rule here. If nothing else, it is not uncommon for new work to depend on both the fixes and supporting work that is in the development branch; that makes it hard to create a base for that work without merging first. And Torvalds acknowledged that some "superfluous merges" are not really a problem — as long as they are adequately explained.

So, for subsystem maintainers who would prefer not to get email from Torvalds in response to a pull request, there are a couple of simple rules to avoid the merge trap. Take the time to write an actual commit message explaining why the merge was done. But, before that, take a moment to think about whether there actually is a reason to do the merge. The result should be a welcome reduction of merge-window stress and a cleaner commit history for the kernel as a whole.

Index entries for this article
KernelDevelopment tools/Git


to post comments

Avoiding the merge trap

Posted Apr 17, 2023 21:12 UTC (Mon) by kleptog (subscriber, #1183) [Link] (10 responses)

This was actually kind of interesting. I'm not a big fan of merge commits, in the default setup of Gitlab/Github they just result in the doubling of the number of commits for no real benefit. There are times they are necessary though, and it's good to see there is some thinking in the kernel development world to avoid useless merge commits.

So I'm agreeing with Linus here. (Though I'm one of those people that rebases continuously and uses Gerrit in cherry-pick mode.)

Avoiding the merge trap

Posted Apr 18, 2023 7:24 UTC (Tue) by taladar (subscriber, #68407) [Link] (9 responses)

If merge commits usually double your number of commits you might be committing too much in one commit on your topic branch.

Avoiding the merge trap

Posted Apr 18, 2023 8:55 UTC (Tue) by adobriyan (subscriber, #30858) [Link] (4 responses)

I've worked at a company where the policy was to create topic branch for everything no matter how small and trivial the change would be.
It effectively doubled the commit count because the most common type of change was "fix 1 bug in 1 commit".

Avoiding the merge trap

Posted Apr 18, 2023 10:12 UTC (Tue) by khim (subscriber, #9252) [Link]

You may look on Android Git tree sometime. Glance here, e.g.

It's waaay more than “doubling of the number” of commits there.

Avoiding the merge trap

Posted Apr 23, 2023 13:34 UTC (Sun) by calumapplepie (guest, #143655) [Link]

If you didn't separate each fix into topic branches, then a number of those fixes would still contain extra merge commits, because each time a fix was committed by someone else while you were working on writing/testing yours, you'd get a merge commit from the `git pull`. (yes you can enable rebase on pull, but see the next sentence) Those extra merge commits (or rebases) would mean more merge conflicts, which you'd need to resolve before doing anything else.

Never committing onto a central branch directly (eg, `1.92-dev`) is a good practice; without it, approval processes are hard, and history can quickly become a mess of repeated splits and merges. The policy is to avoid the mess that can happen from someone trying to make a quick one-commit fix to a problem, but then discovering it doesn't fix the whole problem.

Avoiding the merge trap

Posted May 12, 2023 16:51 UTC (Fri) by rep_movsd (guest, #100040) [Link] (1 responses)

We do that, but in 9 years we havent ever rummaged through the commit history more than a couple of times because it's a small team and quite obvious when and where something changed and who did it

Avoiding the merge trap

Posted May 14, 2023 23:18 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

The real fun comes when someone (usually me) comes out from some bisection session ending up on some "fix things" commit asking why this seemingly unrelated thing was changed with everything else. Bonus points when it's from 5+ years ago. That assumes the bisection even worked because of such "fix compile" commits cluttering the history (forcing merge commits at least lets `--first-parent` skip over unrelated broken topics while traversing the history).

But we're also probably "larger" and also have community contributions that make it hard to go and ask the author for open-ended fix work sometimes.

Avoiding the merge trap

Posted Apr 19, 2023 21:42 UTC (Wed) by kleptog (subscriber, #1183) [Link] (3 responses)

Then I'm wondering what mean by a "topic branch"? In principle, one ticket is one patch unless it's something large. Bug-fixes are very small commits generally. New features also, but if they get larger you split it up across components, say, one for the API change, one for the frontend. The only times recently I've actually made what I would call a 'topic branch" was when we were doing a major refactor creating an entirely new component, that might require 10+ patches. For changes that require only a handful of patches I don't bother with a branch.

The only real criteria is that a single patch should be small enough to be easily reviewable, and must pass all the buildbot tests. Grouping patches by merging branches has always seemed like overhead to me. It's absolutely great that Git has merges and makes them easy, but I don't see how they add value the resulting history.

Avoiding the merge trap

Posted Apr 28, 2023 6:34 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (1 responses)

> In principle, one ticket is one patch

Why?

There can be some refactor done while working on a certain area of the code, that isn't necessarily strictly the fix. If you put all in the same commit it's just harder to review for little reason.

Avoiding the merge trap

Posted Apr 28, 2023 11:11 UTC (Fri) by kleptog (subscriber, #1183) [Link]

> There can be some refactor done while working on a certain area of the code, that isn't necessarily strictly the fix. If you put all in the same commit it's just harder to review for little reason.

Sure, I did say "in principle". In my experience, the vast majority of tickets are single patch changes. If you're doing refactoring of course you split that out. Big changes should be split into logical components. It's a question of experience, you know what kind of thing you like reviewing. But every patch should build and pass the test suite.

In other words, I totally disagree with this proposal. This is working around a bad tool, squishing all the commits after the fact is terrible.

Avoiding the merge trap

Posted Apr 28, 2023 8:38 UTC (Fri) by laarmen (subscriber, #63948) [Link]

Topic branches are useful when you're working on multiple independent topics concurrently. It doesn't matter how many commits on the branches there are, it can be a single commit or dozens of them: while your patchset is being reviewed for inclusion you can switch to other topic branches to work on the other subjects.

To me, part of the value of a merge commit is that it records when and by whom the code was merged into the target branch.

Avoiding the merge trap

Posted Apr 18, 2023 3:37 UTC (Tue) by zorro (subscriber, #45643) [Link] (13 responses)

Can I ask, what is the difference between rebasing and "back merges'?

Avoiding the merge trap

Posted Apr 18, 2023 4:21 UTC (Tue) by apoelstra (subscriber, #75205) [Link] (12 responses)

Rebasing creates entirely new commits. This makes for a fewer total commits and is easier for an initial reviewer, but it loses any obvious connection to the original diffs. (If you know the original commits IDs, you can use git range-diff to compare them, which does some amazing magic to figure out how two sequences of commits compare, decidng which commits were deleted, added, changed slightly, or copied without changes.) So if you are doing a re-review, or the original commits had some useful context attached (e.g. with git-notes), or they appeared in some other branch, rebasing can be destructive.

In my view, short-lived topic branches such as Github/Gitlab "pull requests", ought to be rebased, since they're usually small enough that the loss isn't too great, and may even be loaded into the maintainers' heads, because future reviewers will see only the final state of the change, rather than several versions interspersed with back merges.

Merge commits are also particularly hard to review in git. You can use git-show on them, which does something magical to roughly show you the changes where there were potential conflicts, whether these were manually fixed or automatically fixed by git-merge. But I'm never certain exactly what is being shown. The "correct" way to check merge commits, as far as I can tell, is to do git-diff against all the parents, which is tricky because the resulting diffs will be very large and likely to include differences between the parents themselves rather than just "new changes", whatever that ought to mean, introduced by the merge. So in my view, the fewer merge commits the better.

(For the adversarially-minded people reading this, it is good to be aware that merge commits in git are structurally just like any other commits, except that they have multiple parents. So you can stick evil things into them, and because they're so hard to view -- and AFAICT Gitlab/Github automatically produce tons of these and give no indication that you even *can* view them -- you may well get away with it.)

The story for long-lived branches is more complicated, since here the history loss from rebasing may be significant (basically forcing a re-review of the entire branch's history since it forked), and the rebase itself might involve a ton of extra work.

For a counterpoint to my view, that merge commits are good because they preserve the original commit IDs and therefore the "true history", search LWN or HackerNews for literally any thread about Fossil ;)

Avoiding the merge trap

Posted Apr 18, 2023 18:41 UTC (Tue) by jengelh (subscriber, #33263) [Link] (1 responses)

>The "correct" way to check merge commits, as far as I can tell, is to do git-diff against all the parents [...] you can stick evil things into them, and because they're so hard to view

Under the one condition that there were/are no merge conflicts occurring, revealing evil things amended into merge commits that are not otherwise present in any parents is possible and quite straight-forward.

$ git log --oneline --graph --all
* bb8383e (HEAD -> pizza) Merge branch 'cheese'
|\
| * f314779 (cheese) Add cheese
|/
* 8a8f9fe Add sauce
$ git checkout bb8383e^
$ EDITOR=/bin/true git merge f314779 # hope this is conflict-free
$ git diff HEAD..bb8383e
diff --git a/contents b/contents
index ed33326..925d63d 100644
--- a/contents
+++ b/contents
@@ -1,2 +1,3 @@
sauce
cheese
+pineapple

Avoiding the merge trap

Posted Apr 18, 2023 19:07 UTC (Tue) by apoelstra (subscriber, #75205) [Link]

Oh, this is a great idea!

Amusingly, I have used this technique in the case where I knew there *were* merge conflicts (such as when pulling in upstream changes to an ongoing fork) so that I could easily review the specific choices that were made to deal with them. But somehow it didn't occur to me to use it as a general technique.

This trick does assume that the original committer is using (roughly) the same version of git, the same merge strategy, etc., but in the times that I've used it, these haven't been problems for me.

And I guess it's assuming that git itself isn't malicious ... but if your git is bad, merge commits are the least of your worries!

Avoiding the merge trap

Posted Apr 19, 2023 19:03 UTC (Wed) by jezuch (subscriber, #52988) [Link]

Just wanted to chime in and say on the record that git range-diff is reviewer's best friend secret superweapon.

Volatile git commits

Posted Apr 21, 2023 20:15 UTC (Fri) by marcH (subscriber, #57642) [Link] (8 responses)

> For a counterpoint to my view, that merge commits are good because they preserve the original commit IDs and therefore the "true history", search LWN or HackerNews for literally any thread about Fossil ;)

I can hopefully summarize https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md quickly and save people some time. I think some aspects are relevant to the article.

One major innovation git brought to version control (without unfortunately naming it), is the concept of "volatile" commits. Some people use the awkward "rebasable branch" name, others might use something else. The lack of standard terminology is unfortunate.

It's the simple idea that a commit SHA1 is not final yet because the work is still in progress, under review, etc.

Gerrit's "Change-Id" tracks volatile commits until they're final. Tools like git range-diff (modelled after tbdiff, git series and maybe others) show the evolution of volatile commits https://gitlab.com/gitlab-org/gitlab/-/issues/24096

At some point a commit's SHA1 has to stop being volatile and become final. In the kernel this is when some maintainer (which one?) merges a pull request at the latest.

But the commit SHA1 could become final earlier. Knowing exactly when a commit becomes final can be a surprisingly hard question. So to provide a simple answer to a difficult question, https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md claims that commits should NEVER be volatile! In other words Fossil requires you to share even your mistakes and work in progress, just like CVS and SVN did - or to never commit them.

This simply makes no sense. What next: publishing a record of every time you hit "save" in your editor? Oh wait, cloud documents actually do that :-)

Volatile git commits

Posted Apr 23, 2023 15:33 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (2 responses)

> Gerrit's "Change-Id" tracks volatile commits until they're final.

One question I've had for this is what is expected for commits that get split and/or merged? Do they get multiple Change-Id trailers? Merged in some way? Or is it "one wins, please remember to refresh the Change-Id if you split a commit"?

Volatile git commits

Posted Apr 23, 2023 22:07 UTC (Sun) by excors (subscriber, #95769) [Link]

> One question I've had for this is what is expected for commits that get split and/or merged? Do they get multiple Change-Id trailers? Merged in some way? Or is it "one wins, please remember to refresh the Change-Id if you split a commit"?

A commit can only have a single Change-Id (else Gerrit rejects the push). If you're squashing multiple changes that already had some review, you should edit the commit message to remove all but one Change-Id (maybe keep the one with the most interesting review history), push the new change, add a review comment saying "I've added the changes from <links to the old changes> because <reason>", then go through all the old changes and mark them 'abandoned' and add a comment saying "Superseded by <link to the new change>" or similar.

If you're splitting a change, pick one of the new commits (maybe the biggest one) to keep the original Change-Id, and delete the Change-Id line from all the other commit messages (so the commit-msg hook will automatically generate a new Change-Id for them). That will create some new changes in Gerrit, but Gerrit will recognise the parent/child relationships and display the 'relation chain' in its UI, so reviewers can easily navigate from any change to any other.

Or if you're splitting and you don't want the new changes to depend on each other, rebase each of your commits onto the remote HEAD and push them one by one, so they'll be independent changes with no relation chain. You can also assign them the same 'topic' and they'll be displayed as 'submitted together' in the UI, which is useful when the changes don't technically depend on each other but are logically related and should be reviewed together (and maybe go through CI together). In any case it's probably helpful to add a comment onto the original change pointing at the new ones and vice versa.

I.e. these things aren't handled natively by Gerrit, you have to manually maintain the relationship between commits and changes in a sensible way (which is usually quite straightforward), and use comments to make reviewers aware of any relevant history. (In particular that's important when there are review comments on an old version of one change that are relevant to code that's now in a different change - you don't want to forget about old comments that are still unresolved, and also you want it to be easy for reviewers to find resolved discussions and avoid rehashing the same points.)

In my experience, Gerrit is typically used for fairly small sets of changes - a dozen changes in a relation chain or a topic is probably too many and will make reviewers unhappy. It's best when your work can be reviewed and submitted onto the main branch in a few days. That means it's unlikely you'll need to do major reorganisation of your commits during the review process, so it doesn't matter if the process is a bit clunky. If you're working on something much more complicated that can't go into the main branch until it's all complete, then create a new branch to review and submit in small chunks as you go, and merge(/rebase/cherry-pick/etc) it all back into the main branch later (typically without any further detailed per-commit review, since the relevant reviewers already reviewed each change on the branch, and you just need them to confirm they're happy to merge it).

Volatile git commits

Posted Apr 23, 2023 22:29 UTC (Sun) by marcH (subscriber, #57642) [Link]

Gerrit's Change-Ids are just a magic line in the commit message, so you can do whatever you think is best on a case by case basis.

Code review is not an "exact science" and neither are Gerrit's Change-Id but they are still much better than having no connection at all.

Mercurial Changeset Evolution

Posted Apr 24, 2023 10:59 UTC (Mon) by farnz (subscriber, #17727) [Link] (4 responses)

This is the core idea behind Mercurial Changeset Evolution (hg evolve); commits get split into categories, and you have a separate graph tracking the "evolution" of commits - so that given a commit hash, I can track not just its place in the history graph, but also its place in the evolution graph (which commit was rewritten into this commit by a rebase, which commits were created by rebasing this one etc).

The rest of this comment is a summary of hg evolve for people who've not encountered it in the wild.

The commit states in evolve are ordered, and not all state transitions are safe. A commit can be in one of the following states:

  1. Secret. A secret commit never leaves the repo clone it's found in, and commits created by rewriting a secret commit are themselves secret. Secret commits are volatile in your terms.
  2. Draft. A draft commit can be shared between clones, but there's an explicit understanding that this commit is volatile; sharing draft commits requires the push and pull protocols to understand about the evolution markers.
  3. Public. A public commit is meant to be final. Rewriting commands will refuse to touch a public commit as a result.

Separate to the commit state, you have "markers" to tell Mercurial which commits are in some sense "successors" or "predecessors" of another commit. For example, if I have commit hash 123456, and I use rebase to create commit hash 12ab46, Mercurial will put a marker down to tell everyone that 12ab46 is a successor commit of 123456; if I use split to turn 123456 into ab1234 and bc4567, Mercurial will use markers to tell people that ab1234 and bc4567 are both successors of 123456.

With these markers in place, Mercurial can do more intelligent things when you share mutant history - for example, if you pulled in my commit 123456 and committed 654321 on top of it, then I split 123456 into ab1234 with a child commit bc4567, Mercurial then has enough knowledge to know that when you push back to me, I will want to rebase 654321 as a child of bc4567; if I instead rebased 123456 and 12ab46, Mercurial can use the extra knowledge that 654321 is a child of 123456 to do a better job of automerging and of identifying what caused the conflicts.

Now, with that said, there's lots of issues to resolve before this can be considered production-grade:

  • Sharing markers and associated commits between repos in a batch operation is challenging. You obviously want to share all public commits, and don't want to reveal anything about secret commits, but you don't want to share all draft commits if you can avoid doing so, since many of them are uninteresting. There's also the question about which markers to share - if you and I both know about 123456, which I rebased into 12ab45, and then I rebased 12ab45 into 12cd36, do you need to know about that intermediate rebase, or should I simpy tell you that 123456 was rebased into 12cd36 in my repo?
  • Working with volatile commits is inherently more error-prone than working with non-volatile commits. If a commit cannot be changed, then I can't accidentally rewrite it - but if I do accidentally rewrite the "wrong" commit, or if I decide to undo a rewrite having realised I did the wrong thing (e.g. rebased my topic branch onto "2022-04-release" when I meant to rebase onto "2023-04-release"), how should this be handled, given that you might be working on top of my topic branch, and might have seen the error?
  • How do you get the UX around mutable history sharing right? Mercurial requires you to tell it which repos can take part in mutable history sharing, and which ones can't, and automatically marks commits as "public" if they come in from repos that aren't taking part in mutable history sharing - but is this the right way to do things. Similarly, Mercurial has the hg evolve command for resolving things that can be automatically handled after a pull or similar - but again, is this the right UX?

Mercurial Changeset Evolution

Posted Apr 24, 2023 20:37 UTC (Mon) by marcH (subscriber, #57642) [Link] (3 responses)

Thanks for the really good write-up!

> Now, with that said, there's lots of issues to resolve before this can be considered production-grade

So "hg evolve" solves a very real and very common problem but... maybe it goes too far? Maybe it tries to do too much? Not an issue considering this is a non-official extension. If only solid learnings from hg evolve gradually trickle to core hg then everything works as intended.

Maybe not all problems are worth solving. This reminds me the dismay of some macOS users when it eventually left behind file creator codes, replaced with .txt, .doc and other .xls file extensions https://arstechnica.com/gadgets/2001/08/metadata/

We recently got "git range-diff" which was based on tbdiff, git series and maybe others and this is great but I'm afraid there's nothing comparable to "hg evolve" in the git universe, is there? Can you even experiment and create some sort of "git extension" easily? Afraid not, tbdiff and git series had no dependency on core git AFAIK.

> Secret commits are volatile in your terms.

The distinction between volatile-secret versus volatile-draft seems very useful.

Short of something like hg evolve, I wish the git community could at least standardize on the most basic terminology. I've seen countless blank stares from people who don't care about much about version control (= most developers) when trying to discuss these topics. Not surprising considering the most basic concepts don't even have agreed names.

Also, I really hope mercurial survives, competition is necessary and this discussion is great evidence of that.

Unfortunately https://bitbucket.org/blog/sunsetting-mercurial-support-i...

Mercurial Changeset Evolution

Posted Apr 24, 2023 21:24 UTC (Mon) by kleptog (subscriber, #1183) [Link] (2 responses)

> but I'm afraid there's nothing comparable to "hg evolve" in the git universe, is there? Can you even experiment and create some sort of "git extension" easily?

The thing is that at its core, git really is a stupid content tracker. So it focusses on mechanism rather than policy. I can understand the distinctions that "hg evolve" is drawing, but I'm not really understanding why I need the tool to track this for me. (Nice summary by the way).

Secret commits: if I don't want people to see them I can just not push them. But then again, none of my code is particularly special so if I accidentally pushed it somewhere it wouldn't matter anyway. They'd just become draft instead of secret.

Draft commits: these exist in an ephemeral branch in the Gerrit/Gitlab I push to. If I pull someone else's draft branch it gets a remote ref and it will be automatically cleaned up once the ephemeral branch goes away again, or merged.

Public commits: are those merged with one of the branches destined for release. These are also effectively immutable.

So you could add something in git to track the status per commit, but in the end you can determine its status purely by examining the branches it exists in.

That said, I think if "hg evolve" can solve all the outstanding issues it would provide a good framework for describing the problem. The predecessor/successor markers sound pretty nifty though, going further than what Gerrit Change-Id's get you. It could probably be reverse engineered from the reflog, but otherwise would require some work in the frontend.

In any case, I don't think git will ever start implementing policies to prevent people doing things. It's just too convenient when a tool doesn't get in your way if you're doing something unusual.

Mercurial Changeset Evolution

Posted Apr 25, 2023 10:01 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

So the point of the distinction between "secret", "draft" and "public" is to prevent user error.

If you try to push a set of commits where one of them is secret (e.g. with the equivalent of git push origin HEAD), Mercurial will refuse and tell you which secret commits you tried to push. You have to use hg phase to change the secret commit to draft; similarly, someone pulling from your repo cannot pull secret commits until you use hg phase to make them draft.

Mercurial knows at the protocol level whether a remote branch is immutable or not - and will automatically convert commits from draft to public when they're pushed to an immutable branch. Similarly, if you pull from an immutable branch, it can include markers to tell you that a public commit obsoletes one of your draft commits, whereas if you pull from a mutable branch, it'll keep the commits as drafts.

You can always tell Mercurial that it's got it wrong with hg phase --force, and convert a public commit to a draft. Mercurial's revset language makes it relatively easy to script this, too, because you can define a revset that catches all the commits you're going to work with, and force their phase to draft.

There is, though, a different mindset between git and hg in this respect - git assumes that the user is infallible, and trusts you to never make a mistake. hg assumes that you're fallible, and expects to provide "safety rails" so that you can do your "normal" workflow without fear of doing something bad (like accidentally modifying a commit that's supposed to be immutable). But, like any good safety rails, hg provides ways for you to remove them - that's not going to be part of your normal workflow (you wouldn't do hg phase … && hg rebase … normally), but it's there for you if you need it.

Mercurial Changeset Evolution

Posted Apr 25, 2023 18:26 UTC (Tue) by marcH (subscriber, #57642) [Link]

> So the point of the distinction between "secret", "draft" and "public" is to prevent user error.

Well put, thank you again!

[The rest is answering HEAD~2]

> > Secret commits: if I don't want people to see them I can just not push them. But then again, none of my code is particularly special so if I accidentally pushed it somewhere it wouldn't matter anyway. They'd just become draft instead of secret.

Maybe "private" would be a better name. "Secret" sounds like helping with security but "security secrets" need to be shared within a team more often than not and this is clearly not the intention of that sort of "secret".

> > Public commits: are those merged with one of the branches destined for release. These are also effectively immutable.

"Public" looks again like a poor name because linux-next style branches are public and they're not immutable. I find "Final" better.

> > The thing is that at its core, git really is a stupid content tracker. So it focusses on mechanism rather than policy. I can understand the distinctions that "hg evolve" is drawing, but I'm not really understanding why I need the tool to track this for me. (Nice summary by the way).

It's very likely that "hg evolve" goes much further than core git will ever go for that reason. There is often a fine line between holding the users' hand too much and making sure they have all the rope they need to hang themselves with and yes we know on which side of that line git always prefers to err :-)

That's all fine and good however the mere _concepts_ that "hg evolve" deals with (private/public/draft/volatile/final, etc.) are not just "policy". They became universal since we moved away from centralized version control yet they are missing from even git's documentation, which is a bit ironic when you think about it. The evidence is that you can easily find famous lectures about "don't rebase public branches"... where the term "public" does not make sense because linux-next is public. There are also regularly articles on these topics in some weekly Linux gazette.

Don't just "blame the users"; if they need to be repeatedly "educated" then there's probably something wrong or missing with the tool instead. Well at least don't blame most users most of the time, if you do that then it's clear evidence that there is a gap of some kind. Maybe not a big feature gap as big as "hg evolve" (although we just got "git range-diff", so there was clearly at least that gap) but at the very least a big _conceptual_ gap. I critized "hg evolve"'s terminology but it's at least unambiguous because it's defined by its implementation. Git's counterpart to "hg evolve" is emptiness, not even basic conceptual documentation and IMHO articles like this one keep trying to fill that void.

Another demonstration of this conceptual gap is when people discover how most Github projects use git in a totally different way yet they struggle to describe the difference in less than a few hundred words: https://github.com/zephyrproject-rtos/zephyr/issues/53566

Hey maybe _I_ should stop complaining and "send patches" to git man pages offering some standard terminology, I bet it could be already useful in some existing man pages. This shouldn't require any development knowledge and would be a start

https://public-inbox.org/git/70ccb8fc-30f2-5b23-a832-9e470787a945@intel.com/

Back merge to resolve conflicts

Posted Apr 20, 2023 8:43 UTC (Thu) by epa (subscriber, #39769) [Link] (4 responses)

A common reason to merge master into your branch is to resolve conflicts so that the final merge of the branch into master will go smoothly. Suppose you have been hacking on a new feature and it's nearly ready for inclusion. Since you started, some variable has been renamed in master and your branch will no longer merge without conflicts. You could send it to Linus with the conflicts and ask him to resolve them, probably by picking the newer version of the code but applying the variable rename. However, this kind of drudge work is best done by someone whose time is less valuable than Linus's.

So I would merge the current state of master into my branch, resolving the conflict at that point. My branch now applies cleanly to master and can be merged upstream easily. Now, the commit history of the project contains a weird-looking "reverse merge", but personally I don't have a problem with that. I worry about being able to see the history of the code, not so much about a pretty-looking commit graph.

I'm not really talking about "real" conflicts, where the structure of the code has changed substantially, or an interface you were relying on has been removed. Those would need careful attention in any case and can't really be fixed just in a single merge commit (too much stuff going on at once). I am talking about "trivial" conflicts where a well-trained monkey could get your branch into shape to apply cleanly to master.

Another solution would be to rebase the branch against current master, but rebasing is often frowned upon (for reasons we don't need to argue about here). I guess the other alternative is to do a kind of "manual pre-merge" where you see what merge conflicts would occur, and fix up your branch to pre-empt them. In this example you'd manually rename the variable in your branch. But that's awkward and manual. Doing a merge and resolving conflicts is often easier.

Back merge to resolve conflicts

Posted Apr 20, 2023 9:32 UTC (Thu) by farnz (subscriber, #17727) [Link] (3 responses)

Pre-handling the merge conflicts is what git rerere is meant to help with.

The idea is that having done the merge of main into your topic branch and resolved the conflicts, git will track your manual resolution, and reuse it when you merge your topic branch into main. The sequence would look something like:


$ git config rerere.enabled true # to enable rerere to do its tricks in this project
$ git config --global rerere.enabled true # to enable rerere to do its tricks in all projects
$ git switch topic
$ git merge main
$ ... resolve conflicts
$ git commit
$ git reset --hard HEAD^ # rewind the test merge
$ ... work happens on both topic and main branches
$ git switch main
$ git merge topic
$ ... git will have reused all your merge conflict resolutions from before, ready for you to review

You still have to review your merge commit and confirm that it does the right thing (using git add to tell git about reviewed conflicts) but you no longer have to keep track of the merge conflicts by hand, and you get the same effect as doing the merge of main into topic before merging topic into main, but without the "weird" (by Linus Torvalds' standards) merge of main into topic. If you trust git rerere to do the right thing, then you can also set the rerere.autoUpdate config option, to have git automatically use rerere's output, rather than waiting for manual review.

Back merge to resolve conflicts

Posted Apr 20, 2023 10:25 UTC (Thu) by epa (subscriber, #39769) [Link] (2 responses)

That looks great. So at the final step, you have merged your branch into main -- then how do you send that upstream? I guess I would need to create a kind of dummy branch from main at that point, and send that dummy branch upstream as a merge request. Let's assume that I am the contributor, so in the end it's not my decision what goes into the main branch, but I want to make it easy for the project owner to incorporate my change without conflicts.

Back merge to resolve conflicts

Posted Apr 20, 2023 11:05 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

This all starts to depend on the details of your workflow, and is thus challenging without knowing exactly how you work.

In the kernel workflow, you'd send your merge commit into main to upstream along with your topic branch, and let upstream make its decision about what to do from there - upstream can then see how you resolved conflicts (via your merge commit), and can either copy your resolution while doing their own merge, or merge your merge commit into their version of main, or rebase your merge commit. Up to them at this point, basically.

Back merge to resolve conflicts

Posted Apr 20, 2023 11:44 UTC (Thu) by epa (subscriber, #39769) [Link]

Right, you can send upstream a particular commit, and they can apply it or merge it or whatever. I was stuck in the github/gitlab way of thinking, where I didn't see how I could send that commit to upstream because it didn't have its own branch to live on, and each merge request is for a particular branch.

In the end it boils down to the same thing. I can merge main into my branch, then send the result to upstream. Or I can merge my branch into main, and then send upstream that commit (with its ancestors). The only difference is which way round the commit graph is drawn.

Avoiding the merge trap

Posted Apr 21, 2023 19:36 UTC (Fri) by marcH (subscriber, #57642) [Link] (3 responses)

> Consider that Torvalds did 208 pulls during the 6.3 merge window, each of which added commits to the mainline. If each pull request had to be rebased each time Torvalds did a pull to avoid a merge, little actual work would get done.

git pull --rebase is in fact what maintainers do (or something equivalent) in projects with a linear git policy and not just on github and not just in small / toy projects.

Rebasing does of course increases the chances of conflicts a bit but I'd be curious to see some evidence that this increase would cause "little actual work would get done". This increased resolution effort would be somewhat distributed across all maintainers because people working in the same area have obviously higher chances of conflicting and maintainers at a lower level have to follow the same linear git policy too.

Of course there are plenty of other good reasons not to _want_ to rebase besides git conflicts and in fact the article explains some of them and some comments do that too so this is never going to happen in the kernel but again it is happening and working in other, non-toy projects.

Avoiding the merge trap

Posted Apr 21, 2023 19:52 UTC (Fri) by marcH (subscriber, #57642) [Link] (2 responses)

One of the reasons for NOT rebasing is "logical" conflicts that git does not see, typical example: some API change. No git conflict but compilation fails.

In that case merging is superior because in case of a logical conflict then only the merge commit is broken, all the commits leading to it are still functional on both branches. When rebasing instead then ALL the rebased commits are broken by such a logical conflict. Forensics are also made more difficult.

This is a good example of why people say "rebasing is lying": because the original commits were well tested but the rebased ones never.

One way to mitigate this is to actually TEST the rebased commits in CI:
- https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
- https://docs.github.com/en/repositories/configuring-branc...
- https://www.chromium.org/developers/testing/commit-queue/...

Testing is of course useful even when not rebasing, the same process can validate merges too!

PS: I took a compilation error to simplify but logical conflicts can of course be much more subtle and show up only at run time.

Avoiding the merge trap

Posted Apr 22, 2023 14:24 UTC (Sat) by kleptog (subscriber, #1183) [Link] (1 responses)

> One of the reasons for NOT rebasing is "logical" conflicts that git does not see, typical example: some API change. No git conflict but compilation fails.

> In that case merging is superior because in case of a logical conflict then only the merge commit is broken, all the commits leading to it are still functional on both branches. When rebasing instead then ALL the rebased commits are broken by such a logical conflict. Forensics are also made more difficult.

The conflict is there, so needs to be resolved somewhere. The downside of doing it in the merge commit is that it's probably being done by someone other than the author. By asking the authors to rebase you push the work to the people who are most qualified to do it.

> One way to mitigate this is to actually TEST the rebased commits in CI:

Well of course you should test the individual commits again after rebasing, anything else is silly.

Avoiding the merge trap

Posted Apr 22, 2023 15:17 UTC (Sat) by marcH (subscriber, #57642) [Link]

OK, I shouldn't have taken the example of a compilation failure. It was too simplistic.

If there's a compilation failure then there is a very good chance the maintainer will see it BEFORE either git pull or git pull --rebase and the project policy does not really matter because neither merge policy will happen.

> By asking the authors to rebase you push the work to the people who are most qualified to do it.

Yes but that was not the point I was trying to make.

My previous post was about when the maintainer _misses_ subtle logical conflicts and merges two branches that work separately but fail some run time test together. This is rare but it happens.

In that broken case then a merge gives a "less broken" and more accurate git history compared to git pull --rebase

(Merge train testing and similar prevent the merge no matter what the policy is)


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds