Change IDs for kernel patches

By Jonathan Corbet
August 29, 2019

For all its faults, email has long proved to be an effective communication mechanism for kernel development. Similarly, Git is an effective tool for source-code management. But there is no real connection between the two, meaning that there is no straightforward way to connect a Git commit with the email discussions that led to its acceptance. Once a patch enters a repository, it transitions into a new form of existence and leaves its past life behind. Doug Anderson recently went to the ksummit-discuss list with a proposal to add Gerrit-style change IDs as a way of connecting the two lives of a kernel patch; the end result may not be quite what he was asking for.

The Gerrit code-review system needs to be able to track multiple versions of the same patch; to do so, it adds a Change-Id tag to the patches themselves:

    Change-Id: I6a007dfe91ee1077a437963cf26d91370fdd9556

The tag is automatically added to the first version of a new patch; developers are expected to retain that tag when posting subsequent versions so that Gerrit can associate the new and old versions. These tags are useful for Gerrit, but they have never been welcome in the kernel community; Anderson posted his missive in the hope of changing that attitude and getting the community to allow (or actively encourage) the use of change IDs in patches:

The basic summary is that I'd like there to be some way to track a logical patch over its lifetime. I don't believe there is a reliable (non-heuristic) way to do this today and I think Change-Id provides a nice solution. While we could come up with a new and different solution (because Change-Id was not invented here), it feels like adopting Change-Id is convenient and easy and provides a true benefit.

The problem Anderson describes is real enough; your editor, who spends a lot of time digging up old versions of patch postings to work out how a patch has evolved over time, can attest to that. Guenter Roeck complained that he has to "use a combination of subject analysis and patch content analysis using fuzzy text / string comparison, combined with an analysis of the patch description" to determine whether a given patch has been merged. There seems to be little doubt that the community as a whole would appreciate a better way to associate a patch's history over time and its final resting place in the kernel repository. That is about where the agreement stops, though.

Linus Torvalds was quick to reject the idea of putting a bare change ID into patch changelogs, citing the same reasoning that has kept those IDs out thus far: they are really only useful to whoever put that ID into the changelog in the first place. Gerrit change IDs are useful to people who know which Gerrit instance is tracking the patch in question and who actually have access to that instance. For everybody else, it's just a number that is just extra noise in the changelog; as Torvalds put it: "A 'change ID' that I can't use to look anything up with is completely pointless and should be removed from kernel history". That assertion also implies, of course, that an ID that can be looked up by third parties might have some value.

One way to make a Gerrit change ID useful, he suggested, would be to turn it into a publicly accessible web link; then anybody could follow the link, see whatever other information exists, and track the history of the patch. Olof Johansson disliked that idea, saying that the Gerrit server could be shut down, making the link useless. Ted Ts'o responded that such a fate could befall any web link, including others (such as bugzilla links) that are accepted in changelogs now.

There may be other ways to solve this problem, though. The idea that Torvalds liked the best — and which seems to have the widest support across the community — is to use the unique ID that is already associated with a patch posting, which is the message ID created by the poster's email client:

The first time it gets magically and reliably created for you without you having to do a single thing. The second time, you just look it up.

Ta-daa - you have a "uuid" that is useful to others, and that describes the whole series unambiguously.

There are a few ways this ID could be presented, but the most popular way is to create a "Link:" tag containing a link to the posting of the patch in a public mailing-list archive server (generally lore.kernel.org in recent times). This is not a new practice; it appears to have first been used for this patch applied by H. Peter Anvin in 2011. Use of this tag is not universal, but it is growing; the number of patches in recent kernels carrying Link: tags is:

Release Tags Percent

4.18 1,413 10.6%

4.19 1,944 13.8%

4.20 1,609 11.6%

5.0 1,778 13.9%

5.1 1,908 14.6%

5.2 2,295 16.4%

5.3 2,614 18.4%

Release	Tags	Percent
4.18	1,413	10.6%
4.19	1,944	13.8%
4.20	1,609	11.6%
5.0	1,778	13.9%
5.1	1,908	14.6%
5.2	2,295	16.4%
5.3	2,614	18.4%

Creation of this tag is relatively easy; it can be entirely automated at the point where a patch is applied to a Git repository. But it doesn't solve the entire problem; it can associate a commit with the final posting of a patch on a mailing list, but it cannot help to find previous versions of a patch. Generally, the discussion of the last version of a patch is boring since there is usually a consensus at that point that it should be applied. It's the discussion of the previous versions that will have caused changes to be made and which can explain some of the decisions that were made. But kernel developers are remarkably and inexplicably poor at placing the message ID of the final version of a patch into the previous versions.

The most commonly suggested solution to that problem is not fully automatic. Developers like Thomas Gleixner and Christian Brauner argued in favor of adding a link to previous versions of a patch when posting an updated version. Gleixner called for a link to the cover letter of the prior version, while Brauner puts links to all previous versions. Either way, an interested developer can follow the links backward to see how a patch series has changed, along with the discussions that led to those changes.

A convention like that would provide most or all of what developers like Anderson are asking for. It would, however, require that developers do some work to insert those links, and not everybody is convinced that this will ever happen. Dmitry Torokhov said that he could not be bothered:

As a patch submitter, I frankly could not care less about proper trace, history, etc. I might be putting cover letter and outline the version changes, but I am doing that to reduce friction and help committer to land my change. That's it.

Developers, he said, simply would not do the extra work to add links to previous postings to their cover letters. Anderson also asserted that "the adoption rate will be near to zero". Such concerns have merit; it is hard to force a community of thousands of developers to do more work for every patch they submit. But without their cooperation, this idea will not go far.

The answer, naturally, is to provide tools that make the right thing happen with a minimum of extra work. Gleixner described the setup he uses with Quilt, but it seems unlikely that all developers will find it useful for their purposes. Joel Fernandes described a tool that he is considering writing that might be more generally useful. Greg Kroah-Hartman described it as overly complex, though, and suggested simply posting patches as a reply to previous versions, but others pointed out that not all mailers would make that entirely easy to do.

Ts'o more-or-less ended the discussion by saying that it was time for interested developers to go off and implement a prototype of the tools they have in mind for this task. Then the code could be evaluated to see how it actually works. "Trying to pick something before people who actually have to use it day to day have had a chance to try it in real life is how CIO's end up picking Lotus Notes". That is where things stand now; the next step will come about when somebody comes forward with a tool that might provide a better solution to the problem. Until then, we'll have to continue to use fuzzy string comparisons and other tricks to track the history of patches in the repository.

Index entries for this article
Kernel	Development model/Patch management

to post comments

Change IDs for kernel patches

Posted Aug 30, 2019 0:08 UTC (Fri) by dianders (subscriber, #73136) [Link] (1 responses)

Nice summary! A further link for those interested in one solution that might be an OK compromise (if anyone adopts it):

https://patchwork.ozlabs.org/patch/1154833/

-Doug

Change IDs for kernel patches

Posted Aug 30, 2019 6:44 UTC (Fri) by diconico07 (guest, #117416) [Link]

For me it completely relates to what Konstantin Ryabitsev said in his blog post:
https://people.kernel.org/monsieuricon/patches-carved-int...

Change IDs for kernel patches

Posted Aug 30, 2019 9:21 UTC (Fri) by kleptog (guest, #1183) [Link] (4 responses)

> Linus Torvalds was quick to reject the idea of putting a bare change ID into patch changelogs, citing the same reasoning that has kept those IDs out thus far: they are really only useful to whoever put that ID into the changelog in the first place.

I find this an odd comment. Given the ID is effectively a globally unique string, simply throwing the Change-ID into $SEARCHENGINE will immediately return any and all mail conversations, bug trackers, etc referring to this patch. This seems much more useful that some email message ID, which is not easily searchable because it contains so many word-breaking characters and significant constant parts.

In any case, even in Gerrit you don't really use those Change-IDs to search for patches, each change has an actual URL too, but that doesn't appear in the commit while it's under discussion. The Change-ID is mostly useful after the fact to locate patches that are related.

Change IDs for kernel patches

Posted Sep 3, 2019 4:25 UTC (Tue) by Fowl (subscriber, #65667) [Link] (2 responses)

> Given the ID is effectively a globally unique string

Indeed - I'm not sure why an actual GUID wouldn't work, they're shorter and don't require mailing anything. I guess they're a bit Microsoft-y.

Couldn't a first/arbitrary comit hash become the 'change ID'? Even if that exact commit never ends up in a tree it's a shortish, unique string that doesn't require any new tooling to generate.

Change IDs for kernel patches

Posted Sep 3, 2019 6:50 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

> Couldn't a first/arbitrary comit hash become the 'change ID'?

You don't really need it: it's highly likely that this particular commit would only exist on a developer's machine and would never be posted.

Change-Id: used by Gerrit is not really created by Gerrit - that's just a random string which git presubmit hook adds to the description if it's not there.

This is done that way precisely because Android developers don't have a central Gerrit server: there are bazillion of them - each vendor have one, Google have half-dozen and so on.

Change-Id just allows you to do a search, nothing more, nothing less.

Change IDs for kernel patches

Posted Sep 3, 2019 7:14 UTC (Tue) by Fowl (subscriber, #65667) [Link]

> You don't really need it: it's highly likely that this particular commit would only exist on a developer's machine and would never be posted.

And this confusion is probably the flaw in the idea - once people read 'commit hash' they'll be assume it's a commit they can find. ;)

Change IDs for kernel patches

Posted Sep 5, 2019 8:54 UTC (Thu) by marcH (subscriber, #57642) [Link]

> I find this an odd comment

It's not just odd, it's plain wrong.

Change-Id are being re-used across Gerrit instances and across branches and it works great.

Change IDs for kernel patches

Posted Aug 30, 2019 10:58 UTC (Fri) by wiktor (guest, #132450) [Link] (1 responses)

> The idea that Torvalds liked the best — and which seems to have the widest support across the community — is to use the unique ID that is already associated with a patch posting, which is the message ID created by the poster's email client

For the record `git am` already has `--message-id` option that can be used to insert back-reference to the patch message's ID: https://git-scm.com/docs/git-am#Documentation/git-am.txt-...

Change IDs for kernel patches

Posted Aug 30, 2019 12:01 UTC (Fri) by johill (subscriber, #25196) [Link]

And we use that, with some commit log mangling, to create the Link: in most cases.

Change IDs for kernel patches

Posted Aug 30, 2019 13:02 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

Hmm. How hard would it be to ask that subsequent submissions use `--in-reply-to=` the previous patchset (usually its cover letter)? That's what I do on my (admittedly rare) submissions. It keeps the patchsets together in mail viewers and provides context for the previous submission(s).

Change IDs for kernel patches

Posted Sep 1, 2019 20:32 UTC (Sun) by pbonzini (subscriber, #60935) [Link] (1 responses)

Most maintainers actually dislike that, because they (or their scripts) are used to having one thread per submission---so if you want to apply temporarily v(n-1) it would be harder to extract the appropriate patches out of a thread that includes multiple versions. Others' scripts might have different issues.

Change IDs for kernel patches

Posted Sep 2, 2019 17:28 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Hmm. I haven't been told to stop doing it yet (like I said, my submissions are rare). I feel like such scripts could easily be modified to only look at reroll count subject prefixes or only find patches under a given cover letter. *shrug* it's a thought. I don't maintain any kernel bits, but it's probably how I'd run it if I were to do so.

Change IDs for kernel patches

Posted Aug 30, 2019 13:31 UTC (Fri) by rgb (guest, #57129) [Link] (1 responses)

"Greg Kroah-Hartman described it as overly complex, though, and suggested simply posting patches as a reply to previous versions, but others pointed out that not all mailers would make that entirely easy to do."

I agree with Greg. That's what reply is there for, to provide context. Also reply is such a basic functionality I'm really curious what kind of mail program does not support reply?

I doubt that anyone can come up with an alternative to reply, that is as easy to use and adopt by most/all developers.

Change IDs for kernel patches

Posted Aug 30, 2019 15:26 UTC (Fri) by jgg (subscriber, #55211) [Link]

I think my general complaint is that the built in git tooling for sending to a mailing list does not do nearly enough porcelain stuff to make the patch submission flow easy. I'm always sad when using git send email.

But there is some much potential here! Surely someone must have written nice wrapper scripts..

Change IDs for kernel patches

Posted Sep 1, 2019 15:08 UTC (Sun) by glasserc (subscriber, #108472) [Link] (1 responses)

This discussion is interesting, but I wish it didn't focus on the specific use case of code review for the Linux kernel. Any git project where feature branches are revised and shared over time could benefit from having links from one version of a commit to a previous one. I've often wished Git had something like Mercurial's "obsolesence markers" (from the "changeset evolution" extension). From my (non-kernel) perspective, the question of how these links are serialized into emails is not very important, but having them enables a bunch of other features.

History of the history

Posted Sep 5, 2019 9:33 UTC (Thu) by marcH (subscriber, #57642) [Link]

> Any git project where feature branches are revised and shared over time could benefit from having links from one version of a commit to a previous one.

The tool laser-focused on the "history of the history" is https://github.com/git-series/git-series

Unlike email, all tools specialized for code review implement some form of linking between revisions in their database. While such linking is one of the most basic code review features, there are surprisingly big differences between the approaches.

Gerrit (obviously) supports this well for individual patches but doesn't treat series as a "first-class" citizens, patches in a series are just "related":
https://gerrit-review.googlesource.com/Documentation/user... Trying for instance to see how a series has been re-ordered is close to impossible with that user interface. I just fetch the two revisions and compare with git locally.

On the other hand Gerrit's ability to filter out rebase noise has become as impressive as git series': https://bugs.chromium.org/p/gerrit/issues/detail?id=217#c42

Github is probably the worse at this. Series are first-class citizens but the recommended workflow is to never rewrite anything and always add the equivalent of fixup! commits until the review is finished - except there's no way to mark which commit is a fixup for what. Force pushes are possible but break a number of features. More details and references at https://github.com/zephyrproject-rtos/zephyr/pull/14444

Gitlab follows github's model but seems to cope with force pushes much better.

Cover letter

Posted Sep 2, 2019 21:13 UTC (Mon) by meuh (guest, #22042) [Link] (1 responses)

I've used Link: tag inside commit messages to link those to the cover letter, as I found there's quite often the need to look at the patchset to understand a patch. The cover letter, and the patch series give more context, even in the presence of a "good" commit message.

Those links wouldn't be needed if the cover letter was included as commit message for a merge, but, first, merge commit almost only happen as a result of a pull request, second, it's usual for a patchset to go through different trees.

Also, I've made my best to provide a changelog for each patch series, with links to previous iterations. Having the Link: to the cover letter allows access to this changelog.

Anyway, one could wonder when Git would be able to record the patchset history, so that this history is not recorded in mail box, but in proper version control.

Cover letter

Posted Sep 3, 2019 7:44 UTC (Tue) by johill (subscriber, #25196) [Link]

Some maintainers will create a side-branch and merge it for a bigger series of patches, and put the cover letter into the merge commit message, which seems like a good way to capture this. Perhaps it should be done more often.

Change IDs for kernel patches

Posted Sep 3, 2019 23:49 UTC (Tue) by neilbrown (subscriber, #359) [Link]

> This is not a new practice; it appears to have first been used for this patch applied by H. Peter Anvin in 2011.

It is predated by the URL: tag, first used (that I can find) by Dominik Brodowski in 2005.

https://git.kernel.org/pub/scm/linux/kernel/git/history/h...

Git needs a information preserving rebase

Posted Sep 16, 2019 12:00 UTC (Mon) by jrincayc (guest, #29129) [Link] (3 responses)

Side note: Git needs to by default preserve information during a rebase. As in just as each commit records the parents, patches produced by rebase should record the git id of the patches that were used for the rebase. This would allow automated tools to more easily figure out where a patch came from.

Git needs a information preserving rebase

Posted Sep 16, 2019 13:51 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (2 responses)

Maybe only if those commits are actually available somewhere? References to commits which only existed locally before rebases seems like pure noise to me. Also, what do you do for split/joined commits? Do the magic fixup! or squash! prefixes suppress this? What about manual fixup or squash?

(From a heavy git-rebase user :) )

Git needs a information preserving rebase

Posted Sep 16, 2019 16:01 UTC (Mon) by zblaxell (subscriber, #26385) [Link]

Indeed, the whole point of git rebases is that they are *not*
information-preserving. Rebase is a tool to discard irrelevant cruft:
typos, failed experiments, rejected patches, accidentally added binaries,
commits where you forgot to add "-s", and so forth.

If someone comes up with a way to import and archive patch review
discussions (perhaps in a separate branch or tree) and reference them
in git commits in a way that plays nicely with 'git push' and 'git gc',
then rebase should preserve _that_.

Git needs a information preserving rebase

Posted Sep 17, 2019 9:57 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

I just had the same idea when reading the discussion above. We could have "git rebase", "git am", "git commit --amend", "git cherry-pick" automatically append a tag "original-id:" with the ID of the original patch. In case of fixup/squash, it may very well merge all of them. This way we do keep good traceability in patch series with exactly *zero* effort nor risk of mistake on the developer's side. In fact the developer would only make an effort to remove the tag!

This would also cover part of what is already being done with "cherry-pick -x" that's used a lot for backports. And very often the information provided there would be useful because it would reference a commit ID which is available in another public tree, and even if that tree is not publicly available it would at least serve as a common tracking key.