Investigating GitLab

By Jake Edge
December 5, 2018

Daniel Vetter began his talk in the refereed track of the 2018 Linux Plumbers Conference (LPC) by noting that it would be in a somewhat similar vein to other talks he has given, since it is about tooling and workflows that are outside of the kernel norm. But, unlike those other talks that concerned changes that had already taken place, this talk was about switching open-source graphics projects to using a hosted version of GitLab, which has not yet happened. In it, he wanted to share his thoughts about why he thinks migrating to GitLab makes sense for the kernel graphics community—and maybe the kernel as a whole.

The Direct Rendering Manager (DRM) kernel subsystem is a fairly small part of the kernel, he said. It is also a fairly small part of the open-source graphics stack, which is under the X.Org umbrella. DRM sits in the middle between the two, so the project has learned development tools and workflows from both of the larger projects.

The kernel brought DRM into the Git world in 2006, which was just a year after Git came about; it was a "rough ride" back then, Vetter said. With Git came "proper commit messages". Prior to that, the X.org commit messages might just be a single, unhelpful line; now those messages explain why the change is being made and what it does. The idea of iterating on a patch series on the mailing list came from the kernel side as did the "benevolent dictator" model of maintainership. DRM, the X server, Wayland, and others all followed that model along the way.

From the X.Org side came things like the committer model; in Mesa, every contributor had commit rights. That model has swept through the graphics community, so now DRM, the X server, and Wayland are all run using that scheme. Testing and continuous integration (CI) is something that DRM has adopted from X.Org; the kernel also does this, but DRM has adopted the X.Org approach, tooling, and test suites. For historical reasons, "almost everything" is under the MIT license, which comes from X.Org projects as well.

There has been a lot of movement of tools and development strategies in both directions via the DRM subsystem. He thinks that using GitLab may be "the next big wave of changes" coming from the user-space side to kernel graphics, and maybe to the kernel itself eventually. This won't happen this year or next year, Vetter predicted, but over the next few years we will see GitLab being used more extensively.

Pain points

There are some pain points with the current development process, he said. The "git send-email" approach for sending patches is popular, but teaching new people how to get it to work for them is not trivial, which makes it something of a barrier to entry for mailing list patch-based communities. GitLab turns a patch submission into a pull request through a web page. Pushing and pulling Git branches also works well through corporate firewalls using HTTPS, allowing a more familiar browser-based workflow to be used.

On the administration side, supporting the current workflow style requires keeping a "bouquet of services" running: mail servers for SMTP, mailing list servers, Git servers, and so on. On the freedesktop.org (fd.o) site, where most of the open-source graphics work is housed, the administrators would like to move to a single maintained solution that has a lot less duct tape holding it together. Over the past few years, some projects have moved away from fd.o to GitHub in order to switch to that style of workflow. But kernel developers have some experience in building on top of proprietary tools (i.e. BitKeeper). The fd.o administrators would like to stay in control of their own tools and not be at the mercy of a vendor, Vetter said.

Projects like the kernel have turned to Patchwork to fill in some of the gaps. It turns out to not be the solution that it was hoped to be for a variety of reasons. It tries to follow the discussion on patches in a mailing list, but trying to parse that kind of thread is tricky and Patchwork gets confused regularly; humans are really needed to be able to make sense of a complex patch review thread.

Patchwork also loses semantic information that would be useful to maintain. For example, there is a log of the previous versions of a patch that is maintained by Git, but lost when a patch is posted to a mailing list, so it is lost to Patchwork as well. It is not clear in Patchwork what branch a patch is aimed at. For the DRM subsystem, there is a hard rule that patches submitted to the list target the integration tree, but then sometimes patches that backport something for a stable tree are posted, which confuses things.

In addition, Patchwork is only a side channel. Reviewers can't comment or indicate that the patch has been pulled. The source of truth for that style of workflow is the mailing list; Patchwork only provides a read-only view of it. It is difficult for a maintainer to see what patches are actually pending and which are patches in other states (older versions, already merged, experimental, rejected, etc.). It is also hard to keep Patchwork in sync with multiple inboxes, so it is not well suited to group maintainership.

Another area with pain points is CI. The right way to integrate CI into the workflow, he said, is to ensure that developers get positive confirmation that their patch has passed the tests—or useful feedback if they haven't. Because of the mailing-list-based workflow, more spam has to be created to give that feedback on the build and test status of patches. Patchwork does show CI status but it is not necessarily obvious since it is also buried in results for patches that are not relevant for various reasons.

GitLab

Fd.o started looking at different solutions for replacing its infrastructure and has decided on running the GitLab software. No one wants to re-experience the BitKeeper situation, so GitHub was not really considered, but there are others (e.g. pagure) that were looked at. GitLab's biggest downside is that it is an "open-core" solution but, largely due to Debian's efforts, it has a reasonable open-source approach, he said. Contributions just require a developer certificate of origin (DCO), which is what the kernel uses, and availability under the MIT license.

In addition, GitLab the company cares about big project workflows. Debian, GNOME, Khronos, and others are using it with good results. Vetter is hoping that it is easier for contributors to learn than existing workflows. If GitLab the company "goes evil", there are enough projects using the code that they will be able to keep the project going. GitLab comes with "batteries included": CI, issue tracking, Git repositories, and more. It will allow the fd.o administrators to get rid of a bunch of services on the fd.o infrastructure.

Large project workflows are important for the kernel and even for the DRM subsystem. The AMD and Intel drivers are big enough that putting them in a single repository would be problematic. The X.Org model is to have lots of different repositories, issue trackers, and discussion channels; each project has its own set. There is a need to be able to move issues back and forth between the sub-projects and to be able establish a fork relationship between repositories after the fact so that pull requests can be sent from one to the other; GitLab supports both. In another sign that it cares about large project workflows, Vetter said, GitLab is working on a mechanism to be able to create multi-repository pull requests that the CI system will handle correctly; that would allow a change to go into the whole graphics stack—user space to kernel—as one entity.

There are also some GitLab features that are worth considering for adoption. Merge requests—similar to pull requests—collect up a Git branch, including its history, a target branch, discussion, review, and CI results all into one bundle that can be tracked and managed as a single thing. It is basically everything that a pull request in email provides, with CI status as followup emails, but much of that gets lost in Patchwork.

A bigger problem is patch review, Vetter said; people have panic attacks if you say that you are going to take their email setup away. They have invested a lot of effort into setting up for email review. GitLab has only recently added the idea of per-patch review, as a feature request from the GNOME project. The data model is reasonable, but the user interface for per-patch review is not really usable at this point; it will get better over time, but until then large and complex patch series will need to be reviewed on the mailing lists. The merge request feature may help track the evolution of the series, with links to the email discussion threads.

The CI features of GitLab are particularly good, he said. On the server side, Docker and scriptlets can be used to build a fully customized CI infrastructure. There is support for running tests in customized ways, so a GPU can be used for accelerating the tests rather than just using the software fallback, for example. Every repository and fork has its own CI settings file, which allows for more customization.

Testing the drivers on real hardware is not suitable for the cloud, so those kinds of tests can be run locally. The results of those tests can be fed into the merge request so they can be tracked as well. Failing the CI tests can block merging, which is important for some workflows. The CI features also provide full transparency; developers can watch their CI jobs run in the cloud, he said.

One downside is that Docker, which is used for CI, is "not so great". In theory, it gives full control over the build environment, but if you need to build Docker images inside Docker, the "root escape hatch is needed, which is not so great for a shared hosting setup". Graphic projects want to be able to use KVM and binfmt-misc for running cross-compiled code, but that is not well supported. The fd.o administrators have been wrestling with the cloud runners for GitLab for half a year or so. It is not really working right at this point but, once it gets worked out, it is hoped that it will all work well.

Automation and more

There is a need for automation to relieve the maintainers from having to do "silly repetitive work", like running scripts to check the code. GitLab has some support for that with Webhooks, but it requires a server somewhere to run the hooks and is not ("yet?") up to the level of GitHub Actions. Automating the repetitive part of a maintainer's job is an area that he thinks has a big potential for reducing burnout and generally smoothing out the development process.

The fd.o administrators are worried about Bugzilla and how it interacts with the GDPR; it looks like you are submitting private information, but it then kicks out that information as email to a public list. There are a bunch of warnings in the fd.o Bugzilla instances, but fd.o would like to stop running Bugzilla. The GitLab issue tracker has several nice features, including per-repository templates for bug reports. All of the customization is done using labels, which is "a bit unstructured, but powerful", Vetter said.

Most fd.o projects have migrated their Git repositories, at least, to gitlab.freedesktop.org. The kernel graphics migration is blocked until early 2019 because of the size of its repositories. There has been lots of experimenting with CI; around 8,000 CI runs have been done so far this year. Projects are migrating their issue tracking to GitLab and starting to use its features; there have been around 2,000 merge requests so far.

In summary, Vetter said that Patchwork is a solution to a self-inflicted problem; five years ago, he would have called that "nonsense", but his opinion has changed based on the loss of semantic information with Patchwork. Fundamentally, maintainers want to track merge requests, with all of the the history, CI results, and so on collected up together. GitLab CI is "awesome", he said, but Docker and the cloud are less so. GitLab has fallen behind GitHub in terms of automation, but he is hopeful that it will be catching up. GitLab patch review is currently "bad", but he thinks it has some potential; over time it will get better. Graphics developers and the fd.o administrators are excited about GitLab, but it remains to be seen if GitLab adoption spreads further than that.

There was some concern from the audience about the open-core nature of GitLab. Vetter noted that, unlike some other open-core projects, GitLab does all of its development in the open; there are quite a few contributors to the code from outside of the company. The issue tracker is open as well, though there are some bugs that are hidden because they are customer-specific. The company has been open to moving features from the enterprise edition to the community edition at the request of GNOME or other large projects that are adopting GitLab. That is no guarantee moving forward, of course, but for now it is working well.

A YouTube video of the talk is available, as are the slides [PDF].

[I would like to thank LWN's travel sponsor, The Linux Foundation, for assistance in traveling to Vancouver for LPC.]

Index entries for this article
Conference	Linux Plumbers Conference/2018

to post comments

Investigating GitLab

Posted Dec 5, 2018 21:01 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (5 responses)

> There is a need for automation to relieve the maintainers from having to do "silly repetitive work", like running scripts to check the code. GitLab has some support for that with Webhooks

Shameless plug time :) .

We migrated to GitLab back in 2015 (from a fork of Gerrit, Gitosis 1.x, and some server-side Python/shell scripts). We ended up writing a tool to do the server-side shell scripts via webhooks. That incarnation of the tool lasted about a year (it was a single-file spaghetti-fest of Python2 code primarily due to time contraints on the deployment). Now, we have a toolset for doing various actions implemented in Rust. It does things like commit checks (whitespace, formatting, etc.), submitting to our testing infrastructure (Buildbot), and can even fix formatting issues automatically (assuming the tool supports it) by rewriting the topic's commits (rather than plopping a fixup commit at the end). It also performs merges for us so that it can collect things like `Acked-by` and `Reviewed-by` from MR comments and adds them to the merge commit message. It is piecemeal, so using the checks and reformat doesn't require that you use its merge action. The other one that is interesting for the kernel is that it can maintain what we call a "stage" which is a collection of MRs which are merged together and can be tested as a batch. Adding and removing MRs from the stage is easy and automatic (and it is recreated as needed). Qt and Rust have similar setups. We also added support for "backporting" which takes a single MR and merges it to multiple branches at once so that the release branch doesn't require tandem MRs.

Github now has a better API and webhook structure, but at the time GitLab was vastly better (though Actions are new to me and I'm not familiar enough to comment). Both are very loosy-goosy with their webhook formats though and they're not always well-documented (you get a single example, but a field/type listing isn't available).

The main executable is here: https://gitlab.kitware.com/utils/ghostflow-director and its support libraries are here: https://gitlab.kitware.com/utils?filter=rust

Investigating GitLab

Posted Dec 6, 2018 17:37 UTC (Thu) by bronson (guest, #4806) [Link] (1 responses)

This looks very cool! I hope you'll add an install guide and maybe a blog post or two. One day I'd like to try ghostflow out, but I'd like to know what I'm getting into before I start.

Investigating GitLab

Posted Dec 6, 2018 18:28 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Thanks! Documentation is one place that needs to be improved (namely the configuration file and, yes, deployment). There are some issues that would probably be nice-to-have for other deployments (like storing check configuration in the repository itself), but they haven't been high priority for us. Once I get these done (maybe I'll find time this weekend for deployment docs at least), I'll report back here.

A quick overview of deployment is that there are 2 binaries, webhook-listen and ghostflow-director. The first is for a never-goes-down webhook listener so that webhooks are always delivered even if the director is restarting or down or whatever. It verifies webhooks and classifies them as well. The second does the actual git manipulations with a local clone of the repository. They're also designed so that there's no (non-configuration) state that isn't stored in the repository or service itself. For the service-side, GitLab just involves making an account and giving it Maintainer permissions on the repositories it will act on. Github involves creating an App and an associated account (for an SSH key). Both services require an SSH key it can use for pushing to the relevant repositories.

Investigating GitLab

Posted Dec 15, 2018 21:08 UTC (Sat) by simlo (guest, #10866) [Link] (2 responses)

Out of curiosity: Why not Gerrit?

Investigating GitLab

Posted Dec 16, 2018 0:22 UTC (Sun) by ceplm (subscriber, #41334) [Link]

https://softwareengineering.stackexchange.com/q/173262/71831

We tried to use it for some time internally and it was such pain in ... that we have happily switched to GitLab. Pull/Merge requests are a way more pleasant experience.

Investigating GitLab

Posted Dec 18, 2018 17:00 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

We used Gerrit for a while. We had custom patches that we mentioned at a Google meetup and got a verbal confirmation that they'd be acceptable (as an idea at least). When we went to submit them, they were rejected (as an idea), but we had come to rely on them, so we were stuck on a dead-end fork. The patches were for better topic-level maintenance and review instead of the commit-focused Gerrit workflow. It also didn't support things like rebasing well (interdiff helps here, but who knows when the JGit implementation will do that part). Plus, due to the `next` branch management, merging was done via a side thing (the Gitosis 1.x bit with server-side scripts).

Other than that, I personally really dislike Gerrit for its JavaScript-heavy bits (it took us months to get a static archive of our instance for posterity due to the thing being unusable without JavaScript; the LWN article about web archivers helped here). Marking up every commit with Commit-Id for tracking was…noise to me and required client-side hooks to add the things with extra code to not add duplicates. Dependent branches was nigh unsupported (erroring out on duplicate Change-Ids). Email notifications were busted beyond belief (duplicate Message-ids for different contents). I just never enjoy interacting with Gerrit (I've had other comments on LWN about it as well).

I don't know the current state of Gerrit (my last Qt contribution was over a year ago), but it lives in the same realm as Phabricator to me: contribution is OK, but it's never "fun"; I certainly never want to review code via them at this point though.

Investigating GitLab

Posted Dec 6, 2018 6:02 UTC (Thu) by himi (subscriber, #340) [Link] (4 responses)

I'm not sure if I've been missing something with gitlab and gitlab-ci, but the per-branch nature of the gitlab-ci configuration becomes problematic when doing a merge-request - the source branch .gitlab-ci.yml file ends up being merged on top of the target branch version. This can certainly be worked around, but I haven't found a way to do so that isn't a pain.

Other than that I've found gitlab fairly nice to work with, at least for the small projects I've been dealing with at work.

Investigating GitLab

Posted Dec 6, 2018 9:58 UTC (Thu) by laarmen (subscriber, #63948) [Link] (2 responses)

That's funny, for us it's actually an advantage : if you do some changes that needs to be reflected in the CI configuration, you ship them along in your branch. Configuration changes that are local to a fork should be done using Gitlab CI (protected) variables.

Investigating GitLab

Posted Dec 6, 2018 20:11 UTC (Thu) by himi (subscriber, #340) [Link] (1 responses)

Our use of gitlab-ci is still quite immature, so there probably are things that we're missing or have misunderstood. In our defense, the documentation isn't great . . .

I understand the logic behind treating the CI config as part of the repository, tied to the code that it's testing, but the actual implementation is what's left me a bit frustrated.

GitLab CI fork interaction and Samba

Posted Dec 8, 2018 8:52 UTC (Sat) by abartlet (subscriber, #3928) [Link]

In Samba, we have two files:
- .gitlab-ci.yml
- .gitlab-ci-private.yml

The second file references a tag called private, and includes the first.

On repositories with access to our private runners (we host on gitlab.com and use their shared runners where able) we just change the configured CI file, but this setting doesn't persist over a fork, so everyone else gets a subset by default.

Not perfect, but working pretty well so far.

Samba's own particular way of using GitLab is here: https://wiki.samba.org/index.php/Samba_CI_on_gitlab

Investigating GitLab

Posted Dec 6, 2018 10:25 UTC (Thu) by k8to (guest, #15413) [Link]

I think this problem is a reflection of two things.

It's hard to get continuous integration right. There are many pitfalls and few good guidebooks.

The gitlab ci docs are very spare. Fully explaining how the machinery works is not a task the docs take on. You get to find out by trial and error. Trying to schedule more than one type of periodicity and reuse some logic on pull request is pretty unclear, for example.

All the CI systems are fairly opaque, so this one being so as well wasn't a shock, but it has so little functionality, that I was quite let down it wasn't properly explained.

One hope I have for greater open source adoption of gitlab is for more developers to spend time on cleanup and minor feature work. I don't think it would take that much critical mass to exceed the rate of progress of github.

Investigating GitLab

Posted Dec 6, 2018 8:06 UTC (Thu) by diconico07 (guest, #117416) [Link] (7 responses)

This article reminds me of Greg K.H defending e-mail based workflow a few years ago (https://lwn.net/Articles/702177/), and it looks like the same issues will arise, namely :
- You need to register to *insert project/subsystem name* GitLab to be able to post a patch/bug report/review
- you need a reliable and constant internet connection to be able to work with it
And these are the more visible issues. So I think these tools are useful but lacks the not connected workflow, and the highly distributed identity management.
Moreover it would mean having a single gitlab instance as (as far as I know) you can't do MRs that stretch across multiple GitLab instance : if we assume it is used by the kernel and related user-space stack, we can't assume every part of that ecosystem will be on the same instance, you are then losing the "linked" MR feature.
For the history of patch series, GitLab rapidly make it unreadable, especially if you rebase to follow source branch (this can be leveraged by making a separate push for the rebase, but here again, you lose the offline workflow).

Investigating GitLab

Posted Dec 6, 2018 13:08 UTC (Thu) by Lennie (guest, #49641) [Link]

> you can't do MRs that stretch across multiple GitLab instance

Federating Gitlab would be great especially if the protocol could also be supported by something like Gitea which is more than small enough to run on any system. And if the protocol would support something async. than it could be used without that permanent Internet connection as well.

Investigating GitLab

Posted Dec 6, 2018 15:45 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> For the history of patch series, GitLab rapidly make it unreadable, especially if you rebase to follow source branch

There's work on using the new git range-diff for intra-MR diffs.

Investigating GitLab

Posted Dec 8, 2018 6:50 UTC (Sat) by marcH (subscriber, #57642) [Link] (4 responses)

> So I think these tools are useful but lacks the not connected workflow, and the highly distributed identity management.

What is "distributed identity management"?

Investigating GitLab

Posted Dec 8, 2018 12:08 UTC (Sat) by farnz (subscriber, #17727) [Link] (3 responses)

Basically the idea that my identity is not proven by a central platform. No need to register for accounts with GitLab's (or whoever's) choices of account provider - just use your choice of account provider, and have the system attach your work to that ID (say an e-mail address).

Investigating GitLab

Posted Dec 8, 2018 16:11 UTC (Sat) by marcH (subscriber, #57642) [Link] (2 responses)

Email identity is distributed but it's not "managed" or "proven", that's why spam for instance exists.

For actual distributed identity management look at OpenID, Oauth2, LaunchPad, Facebook connect, late Persona, etc. already in use in some places like github, openstack etc.

Investigating GitLab

Posted Dec 9, 2018 5:34 UTC (Sun) by daniels (subscriber, #16193) [Link] (1 responses)

Exactly - freedesktop.org's GitLab instance offers GitHub, Google, gitlab.com, and Twitter, for external identity provision. One of the benefits of using these services is that they have much better spam prevention than we could ever offer.

You can also register locally using email verification, which does at least have a higher bar to preventing spam sign-ups than an external OAuth hook. So far for us, it's been fairly easy to spot and blacklist individual domains which have been used for spam accounts. I don't think the same would be true of decentralised HTTP-based identity systems.

Investigating GitLab

Posted Dec 10, 2018 8:49 UTC (Mon) by diconico07 (guest, #117416) [Link]

Those authentication method are good, but still creates an account on every instance, so you then have to comply GDPR for storing those account related information.

Thinking over this again, it also brings another issue (not really for the authentication part though):
Using a centralized system also brings the need to maintain it, and to host it, with enough horsepower to handle everyone.
Moreover, you need to carefully choose your provider if you want to be available from everywhere (see https://about.gitlab.com/2018/07/19/gcp-move-update/ for what I mean by this)

Investigating GitLab

Posted Dec 6, 2018 14:22 UTC (Thu) by champtar (subscriber, #128673) [Link] (2 responses)

> One downside is that Docker, which is used for CI, is "not so great". In theory, it gives full control over the build environment, but if you need to build Docker images inside Docker, the "root escape hatch is needed, which is not so great for a shared hosting setup".

You need to take a look at Buildah (and Podman)

Investigating GitLab

Posted Dec 6, 2018 16:25 UTC (Thu) by debfx (subscriber, #67022) [Link]

> You need to take a look at Buildah (and Podman)

Or kaniko which is nicely documented by GitLab: https://docs.gitlab.com/ee/ci/docker/using_kaniko.html

Investigating GitLab

Posted Dec 9, 2018 5:36 UTC (Sun) by daniels (subscriber, #16193) [Link]

> You need to take a look at Buildah (and Podman)

We are! libinput, for example, already uses this. Docker is an easy on-ramp due to its ubiquity, but there are better tools around which we're hoping to have used more widely.

Investigating GitLab

Posted Dec 7, 2018 2:19 UTC (Fri) by ruscur (guest, #104891) [Link] (1 responses)

I don't think some of the things mentioned in this article about Patchwork are entirely accurate.

It's not read-only and it can be an accurate state of the list, updated when patches are merged, under review, superceded, rejected etc. Take a look at linuxppc-dev, for instance: http://patchwork.ozlabs.org/project/linuxppc-dev/list/

To be easy for maintainers, this requires things like git hooks and magic to sync with Patchwork, which is non-trivial, but once set up it is good.

I don't think test results are buried in Patchwork either. It can be if you're only talking about things like 0-day email replies to patches, but you can send test results directly to Patchwork and have them show up in both the patch list (as a Success/Warning/Failure count) but also listed at the top of the patch when you inspect it. Again, go to an individual patch on the linuxppc-dev list and have a look at the checks, and you can filter patches based on author or whatever you wanted.

Patchwork also supports tagging in the API so you could filter based on what branches patches are intended for, if you wanted.

I'm not going to try and argue against moving some kernel development away from mailing lists because obviously you do get a lot of nice things with a more modern solution like GitLab, but a *lot* of the niceties you get from doing that you can have without changing the way your developers do things.

(disclaimer: I run the CI system that sends things to Patchwork on linuxppc-dev and some other lists, and am plugging https://github.com/ruscur/snowpatch/ if you want to automate running tests on incoming patches and sending the results to Patchwork)

Investigating GitLab

Posted Dec 7, 2018 11:29 UTC (Fri) by blackwood (guest, #44174) [Link]

We do use all that on fd.o's patchwork too. I think we even looked at snowpatch when we've started setting up everything, but then ended up rolling our own glue. The problem is still that it's all different disjoint views that are hard to keep in sync.

Re: the patchwork status: At least for us the source of truth is still the mailing list/git repo, setting a patch series to "accepted" doesn't also merge it. It's just that you need humans to handle all the corner-cases, hence why you can change it through patchwork. But in the end it's still only trying to reflect information that's stored somewhere else, but with a bit more structure.