Debian discusses vendoring—again
The problems with "vendoring" in packages—bundling dependencies rather than getting them from other packages—seems to crop up frequently these days. We looked at Debian's concerns about packaging Kubernetes and its myriad of Go dependencies back in October. A more recent discussion in that distribution's community looks at another famously dependency-heavy ecosystem: JavaScript libraries from the npm repository. Even C-based ecosystems are not immune to the problem, as we saw with iproute2 and libbpf back in November; the discussion of vendoring seems likely to recur over the coming years.
Many application projects, particularly those written in languages like JavaScript, PHP, and Go, tend to have a rather large pile of dependencies. These projects typically simply download specific versions of the needed dependencies at build time. This works well for fast-moving projects using collections of fast-moving libraries and frameworks, but it works rather less well for traditional Linux distributions. So distribution projects have been trying to figure out how best to incorporate these types of applications.
This time around, Raphaël Hertzog raised the issue with regard to the Greenbone Security Assistant (gsa), which provides a web front-end to the OpenVAS vulnerability scanner (which is now known as Greenbone Vulnerability Management or gvm).
The Debian policy forbids download during the build so we can't run the upstream build system as is.
Hertzog suggested three possible solutions: collecting all of the dependencies into the Debian source package (though there would be problems creating the copyright file), moving the package to the contrib repository and adding a post-install step to download the dependencies, or removing gsa from Debian entirely. He is working on updating gsa as part of his work on Kali Linux, which is a Debian derivative that is focused on penetration testing and security auditing. Kali Linux does not have the same restrictions on downloading during builds that Debian has, so the Kali gsa package can simply use the upstream build process.
He would prefer to keep gsa in Debian, "but there's only so
much busy-work that I'm willing to do to achieve this goal
". He
wondered if it made more sense for Debian to consider relaxing its
requirements. But Jonas Smedegaard offered
another possible approach: analyzing what packages are needed by gsa and
then either using existing Debian packages for those dependencies or
creating new ones for those that are not available. Hertzog was convinced
that wouldn't be done, but Smedegaard said that the
JavaScript team is already working on that
process for multiple projects.
Hertzog ran the analysis script described on that page and pointed to
the output from
the package.json file of gsa. He said that it confirmed his
belief that there are too many dependencies to package;
"Even if you package everything, you will never ever have the right
combination of version of the various packages.
"
To many, that list looks daunting at best, impossible at worst, but
Smedegaard seemed
unfazed, noting several reasons to believe that those dependencies can
be handled. But
Hertzog pointed
out that the work is not of any real benefit, at least in his mind. He
cannot justify spending lots of time packaging those npm modules (then
maintaining them) for a single package "when said package was updated in Kali in a matter of
hours
". He thinks the distribution should focus its efforts
elsewhere:
He said that Debian is failing to keep up with the paradigm change in these other
ecosystems, which means that "many useful things
" are not
being packaged. Pirate Praveen agreed that there are useful things going
unpackaged, but disagreed
with Hertzog's approach of simply using the upstream download-and-build
process. Praveen thinks that a mix of vendoring (bundling) for
ultra-specific dependencies and creating
packages for more generally useful modules is the right way forward. It
comes down to distributions continuing to provide a particular service for their users:
One of the reasons Smedegaard felt that the dependencies for gsa could be
handled via Debian packages is that gsa (and other large projects) tend to
overspecify the versions required; in many cases, other versions (which
might already be packaged for Debian) work just fine. But figuring that
out is "a substantial amount of work
", Josh Triplett said in a lengthy
message. He
cautioned against the "standard tangent
" where complaints
about the number of dependencies for these types of projects are aired.
He said that disregarding a project's guidance on the versions for its dependencies is fraught, especially for dynamically typed languages where problems may only be detected at run time. For those ecosystems, the normal Debian practice of having only one version of a given library available may be getting in the way. Relaxing that requirement somewhat could be beneficial:
Triplett outlined the problems that developers encounter when trying to package a project of this sort. They can either try to make it work with the older libraries available in Debian, upgrade the libraries in Debian and fix all the resulting problems in every package that uses them, or simply bundle the required libraries. The first two are enormously difficult in most cases, so folks settle for bundling, which is undesirable but unavoidable:
Adrian Bunk is concerned with handling security problems in a world with multiple library versions. He said that these ecosystems seem to not be interested in supporting stable packages for three to five years, as needed by distributions such as Debian stable or Ubuntu LTS. More library proliferation (version-wise) just means more work for Debian when the inevitable CVE comes along, he said.
But Triplett said that he is not expecting there to be a lot of different library versions, but that at times it might make sense to have more than one:
That's different from requiring *exactly one* version of xyz, forcing all packages to transition immediately, and preventing people from uploading packages because they don't fork upstream and port to different versions of dependencies.
It seems safe to say that few minds were changed in the course of the discussion. Bunk and Triplett seemed to talk past each other a fair bit. And no one spoke up with some wild new solution to these problems. But the problems are not going to disappear anytime soon—or ever. Without some kind of shift, bundling will likely be the path of least resistance, at least until some hideous security problem has to be fixed in enough different packages that bundling is further restricted or prohibited. That would, of course, then require a different solution.
The approach currently being taken by Smedegaard, Praveen, and others to tease out the dependencies into their own packages has its attractions, but scalability and feasibility within a volunteer-driven organization like Debian are not among them. The size and scope of the open-source-creating community is vastly larger than Debian or any of its language-specific teams, so it should not come as a surprise that the distribution is not keeping up. Debian is hardly alone with this problem either, of course; it is a problem that the Linux distribution community will continue to grapple with.