Investigating GitLab
Daniel Vetter began his talk in the refereed track of the 2018 Linux Plumbers Conference (LPC) by noting that it would be in a somewhat similar vein to other talks he has given, since it is about tooling and workflows that are outside of the kernel norm. But, unlike those other talks that concerned changes that had already taken place, this talk was about switching open-source graphics projects to using a hosted version of GitLab, which has not yet happened. In it, he wanted to share his thoughts about why he thinks migrating to GitLab makes sense for the kernel graphics community—and maybe the kernel as a whole.
The Direct Rendering Manager (DRM) kernel subsystem is a fairly small part of the kernel, he said. It is also a fairly small part of the open-source graphics stack, which is under the X.Org umbrella. DRM sits in the middle between the two, so the project has learned development tools and workflows from both of the larger projects.
The kernel brought DRM into the Git world in 2006, which was just a year after Git came about; it was a "rough ride" back then, Vetter said. With Git came "proper commit messages". Prior to that, the X.org commit messages might just be a single, unhelpful line; now those messages explain why the change is being made and what it does. The idea of iterating on a patch series on the mailing list came from the kernel side as did the "benevolent dictator" model of maintainership. DRM, the X server, Wayland, and others all followed that model along the way.
From the X.Org side came things like the committer model; in Mesa, every contributor had commit rights. That model has swept through the graphics community, so now DRM, the X server, and Wayland are all run using that scheme. Testing and continuous integration (CI) is something that DRM has adopted from X.Org; the kernel also does this, but DRM has adopted the X.Org approach, tooling, and test suites. For historical reasons, "almost everything" is under the MIT license, which comes from X.Org projects as well.
There has been a lot of movement of tools and development strategies in both directions via the DRM subsystem. He thinks that using GitLab may be "the next big wave of changes" coming from the user-space side to kernel graphics, and maybe to the kernel itself eventually. This won't happen this year or next year, Vetter predicted, but over the next few years we will see GitLab being used more extensively.
Pain points
There are some pain points with the current development process, he said. The "git send-email" approach for sending patches is popular, but teaching new people how to get it to work for them is not trivial, which makes it something of a barrier to entry for mailing list patch-based communities. GitLab turns a patch submission into a pull request through a web page. Pushing and pulling Git branches also works well through corporate firewalls using HTTPS, allowing a more familiar browser-based workflow to be used.
On the administration side, supporting the current workflow style requires keeping a "bouquet of services" running: mail servers for SMTP, mailing list servers, Git servers, and so on. On the freedesktop.org (fd.o) site, where most of the open-source graphics work is housed, the administrators would like to move to a single maintained solution that has a lot less duct tape holding it together. Over the past few years, some projects have moved away from fd.o to GitHub in order to switch to that style of workflow. But kernel developers have some experience in building on top of proprietary tools (i.e. BitKeeper). The fd.o administrators would like to stay in control of their own tools and not be at the mercy of a vendor, Vetter said.
Projects like the kernel have turned to Patchwork to fill in some of the gaps. It turns out to not be the solution that it was hoped to be for a variety of reasons. It tries to follow the discussion on patches in a mailing list, but trying to parse that kind of thread is tricky and Patchwork gets confused regularly; humans are really needed to be able to make sense of a complex patch review thread.
Patchwork also loses semantic information that would be useful to maintain. For example, there is a log of the previous versions of a patch that is maintained by Git, but lost when a patch is posted to a mailing list, so it is lost to Patchwork as well. It is not clear in Patchwork what branch a patch is aimed at. For the DRM subsystem, there is a hard rule that patches submitted to the list target the integration tree, but then sometimes patches that backport something for a stable tree are posted, which confuses things.
In addition, Patchwork is only a side channel. Reviewers can't comment or indicate that the patch has been pulled. The source of truth for that style of workflow is the mailing list; Patchwork only provides a read-only view of it. It is difficult for a maintainer to see what patches are actually pending and which are patches in other states (older versions, already merged, experimental, rejected, etc.). It is also hard to keep Patchwork in sync with multiple inboxes, so it is not well suited to group maintainership.
Another area with pain points is CI. The right way to integrate CI into the workflow, he said, is to ensure that developers get positive confirmation that their patch has passed the tests—or useful feedback if they haven't. Because of the mailing-list-based workflow, more spam has to be created to give that feedback on the build and test status of patches. Patchwork does show CI status but it is not necessarily obvious since it is also buried in results for patches that are not relevant for various reasons.
GitLab
Fd.o started looking at different solutions for replacing its infrastructure and has decided on running the GitLab software. No one wants to re-experience the BitKeeper situation, so GitHub was not really considered, but there are others (e.g. pagure) that were looked at. GitLab's biggest downside is that it is an "open-core" solution but, largely due to Debian's efforts, it has a reasonable open-source approach, he said. Contributions just require a developer certificate of origin (DCO), which is what the kernel uses, and availability under the MIT license.
In addition, GitLab the company cares about big project workflows. Debian, GNOME, Khronos, and others are using it with good results. Vetter is hoping that it is easier for contributors to learn than existing workflows. If GitLab the company "goes evil", there are enough projects using the code that they will be able to keep the project going. GitLab comes with "batteries included": CI, issue tracking, Git repositories, and more. It will allow the fd.o administrators to get rid of a bunch of services on the fd.o infrastructure.
Large project workflows are important for the kernel and even for the DRM subsystem. The AMD and Intel drivers are big enough that putting them in a single repository would be problematic. The X.Org model is to have lots of different repositories, issue trackers, and discussion channels; each project has its own set. There is a need to be able to move issues back and forth between the sub-projects and to be able establish a fork relationship between repositories after the fact so that pull requests can be sent from one to the other; GitLab supports both. In another sign that it cares about large project workflows, Vetter said, GitLab is working on a mechanism to be able to create multi-repository pull requests that the CI system will handle correctly; that would allow a change to go into the whole graphics stack—user space to kernel—as one entity.
There are also some GitLab features that are worth considering for adoption. Merge requests—similar to pull requests—collect up a Git branch, including its history, a target branch, discussion, review, and CI results all into one bundle that can be tracked and managed as a single thing. It is basically everything that a pull request in email provides, with CI status as followup emails, but much of that gets lost in Patchwork.
A bigger problem is patch review, Vetter said; people have panic attacks if you say that you are going to take their email setup away. They have invested a lot of effort into setting up for email review. GitLab has only recently added the idea of per-patch review, as a feature request from the GNOME project. The data model is reasonable, but the user interface for per-patch review is not really usable at this point; it will get better over time, but until then large and complex patch series will need to be reviewed on the mailing lists. The merge request feature may help track the evolution of the series, with links to the email discussion threads.
The CI features of GitLab are particularly good, he said. On the server side, Docker and scriptlets can be used to build a fully customized CI infrastructure. There is support for running tests in customized ways, so a GPU can be used for accelerating the tests rather than just using the software fallback, for example. Every repository and fork has its own CI settings file, which allows for more customization.
Testing the drivers on real hardware is not suitable for the cloud, so those kinds of tests can be run locally. The results of those tests can be fed into the merge request so they can be tracked as well. Failing the CI tests can block merging, which is important for some workflows. The CI features also provide full transparency; developers can watch their CI jobs run in the cloud, he said.
One downside is that Docker, which is used for CI, is "not so great". In theory, it gives full control over the build environment, but if you need to build Docker images inside Docker, the "root escape hatch is needed, which is not so great for a shared hosting setup". Graphic projects want to be able to use KVM and binfmt-misc for running cross-compiled code, but that is not well supported. The fd.o administrators have been wrestling with the cloud runners for GitLab for half a year or so. It is not really working right at this point but, once it gets worked out, it is hoped that it will all work well.
Automation and more
There is a need for automation to relieve the maintainers from having to do "silly repetitive work", like running scripts to check the code. GitLab has some support for that with Webhooks, but it requires a server somewhere to run the hooks and is not ("yet?") up to the level of GitHub Actions. Automating the repetitive part of a maintainer's job is an area that he thinks has a big potential for reducing burnout and generally smoothing out the development process.
The fd.o administrators are worried about Bugzilla and how it interacts with the GDPR; it looks like you are submitting private information, but it then kicks out that information as email to a public list. There are a bunch of warnings in the fd.o Bugzilla instances, but fd.o would like to stop running Bugzilla. The GitLab issue tracker has several nice features, including per-repository templates for bug reports. All of the customization is done using labels, which is "a bit unstructured, but powerful", Vetter said.
Most fd.o projects have migrated their Git repositories, at least, to gitlab.freedesktop.org. The kernel graphics migration is blocked until early 2019 because of the size of its repositories. There has been lots of experimenting with CI; around 8,000 CI runs have been done so far this year. Projects are migrating their issue tracking to GitLab and starting to use its features; there have been around 2,000 merge requests so far.
In summary, Vetter said that Patchwork is a solution to a self-inflicted problem; five years ago, he would have called that "nonsense", but his opinion has changed based on the loss of semantic information with Patchwork. Fundamentally, maintainers want to track merge requests, with all of the the history, CI results, and so on collected up together. GitLab CI is "awesome", he said, but Docker and the cloud are less so. GitLab has fallen behind GitHub in terms of automation, but he is hopeful that it will be catching up. GitLab patch review is currently "bad", but he thinks it has some potential; over time it will get better. Graphics developers and the fd.o administrators are excited about GitLab, but it remains to be seen if GitLab adoption spreads further than that.
There was some concern from the audience about the open-core nature of GitLab. Vetter noted that, unlike some other open-core projects, GitLab does all of its development in the open; there are quite a few contributors to the code from outside of the company. The issue tracker is open as well, though there are some bugs that are hidden because they are customer-specific. The company has been open to moving features from the enterprise edition to the community edition at the request of GNOME or other large projects that are adopting GitLab. That is no guarantee moving forward, of course, but for now it is working well.
A YouTube video of the talk is available, as are the slides [PDF].
[I would like to thank LWN's travel sponsor, The Linux Foundation, for
assistance in traveling to Vancouver for LPC.]
| Index entries for this article | |
|---|---|
| Conference | Linux Plumbers Conference/2018 |