[go: up one dir, main page]

|
|
Log in / Subscribe / Register

Rationalizing Python packaging

By Jonathan Corbet
October 16, 2013
The Python language comes with a long list of nice features, in keeping with the language's "batteries included" mantra. One battery that is noticeably absent, though, is a comprehensive mechanism for the building, distribution, and installation of Python packages. That leaves packagers and users having to choose between a variety of third-party tools or just giving up and solving the whole problem themselves. The good news is that Python 3.4 is likely to solve this problem, but Python 2 users may still have to go battery shopping on their own.

Python packaging has long been recognized as a problem for users of the language. There is an extensive collection of add-on modules in the Python Package Index (PyPI), but there is no standard way for a user to obtain one of those modules (and, crucially, any other modules it depends on) and install it on their system. The distutils package — the engine behind the nearly omnipresent setup.py files found in modules — can handle some of the mechanics of installation, but it is showing its age and lacks features. Distutils2 is a fork of distutils intended to solve many of the problems there, but this project appears to have run out of steam. Setuptools is a newer approach found on many systems, but it has a long list of problems of its own. Distribute is "a deprecated fork" of Setuptools. And so on; one does not need to look for long to see that the situation is messy — and that's without looking at the variety of package formats ("egg," "wheel," etc.) out there.

For a while, the plan was to complete work on distutils2 and merge the result into the Python 3.3 release. But, in June 2012, that effort collapsed when it became clear that the work would not be anywhere near complete in time. The results were a 3.3 release without an improved packaging story, an epic email thread on the nature of the problem and what should be done about it, and a virtual halt to distutils2 work.

PEP 453

Well over one year later, a solution appears to be in sight; it takes the form of PEP 453, which, barring some unforeseen glitch, should be officially approved in the near future. This proposal, written by Donald Stufft and Nick Coghlan, charts the path toward better Python package management.

One might start by wondering why such a thing is needed in the first place. Linux users, of course, already have systems with nice package management built into them. But the world is full of users of other operating systems that lack comprehensive packaging systems. And, even on Linux, even on Debian, one is unlikely to find packages for all 35,690 packages found in PyPI, so Linux users, too, are likely to have to install modules outside of the distribution's packaging system. It would seem that there is a place for a package distribution mechanism for Python modules, much like the Perl community has long had with CPAN.

PEP 453 calls for that mechanism to be built on PyPI using the pip installer. Pip, which is already in wide use, lacks a number of the problems found in its predecessors (though pip is based on Setuptools — a dependency which is expected to go away over time). It does not attempt to solve the whole problem, so complicated programs with non-Python dependencies may still end up needing a more comprehensive tool like Buildout or conda. But, for most users, pip should be more than adequate. And, by designating pip as the officially recommended installer, the PEP should help to direct resources toward improving pip and porting modules to it.

Pip will become a part of the standard Python distribution, but in an interesting way. A copy of pip will be hidden away deep within the Python library; it can then be installed into the system using the (also included) ensurepip module. Anybody installing their own version of Python can optionally use ensurepip to install pip; otherwise they can get it independently or (for Linux users) rely on the version shipped by the distributor. Python will also include a bundle of certificate-authority certificates to verify package sources, though the PEP envisions distributors wanting to replace that with their own central CA certificate collection. For as long as pip needs Setuptools, that will be bundled as well.

This scheme thus calls for pip to be distributed with Python, but it will not strictly become a part of Python. It will remain an independently developed project that, it is expected, will advance more quickly than Python and make more frequent releases. Python's 18-month cycle was seen as being far too slow for a developing utility like pip, so the two will not be tied together. There is a plan to include updated versions of pip in Python maintenance releases, though, to ensure that security fixes get out to users eventually.

Pip for Python 2

Perhaps the most controversial part of earlier versions of this PEP was a plan to include a version of ensurepip in the next Python 3.3 and 2.7 releases as well. The motivation for this idea is clear enough: if pip is to be the standard Python package manager, it would be nice to make it easily available to all Python users. As much as the Python developers would like to see everybody using Python 3, they have a realistic view of how long it will really take for users — especially those with existing, working applications — to move off Python 2. Putting ensurepip into (say) Python 2.7.6 would make it easier for Python 2 developers to work with the official packaging system.

On the other hand, Python 2 is currently being maintained under a strict "no new features" policy; adding ensurepip would require an explicit exception that, some developers fear, could open the floodgates for similar requests from developers of other modules. There are also worries that, once ensurepip goes in, some versions of Python 2.7 will have different feature sets than others, creating confusion for application developers and users. And, though they were not in the majority, some developers clearly do not want to do anything that might encourage developers to stay with Python 2 for any longer than necessary. These concerns led to substantial opposition to adding ensurepip to point releases of older Python versions.

The end result is a compromise: the documentation for Python 3.3 and 2.7 will be updated to anoint pip as the standard package manager, but no other changes will be made to those versions — for now. Nick has stated his intent to put together a separate PEP to revisit the idea of bundling pip and Python 2.7 for separate consideration once the (relatively uncontroversial) question of getting pip into the 3.4 release is resolved.

Assuming there are no major disagreements, that resolution should happen soon. It needs to: the Python 3.4 release schedule calls for the first beta release — and associated feature freeze — to happen on November 24. The actual 3.4 release is currently planned for late February; after that, Python developers and users should have a standardized packaging and distribution scheme for the first time. "Better late than never" certainly applies in this case.


to post comments

Rationalizing Python packaging

Posted Oct 16, 2013 21:27 UTC (Wed) by Tobu (subscriber, #24111) [Link] (1 responses)

Here is a good place to profess my love of pip, and particularly the way it has unified the mess by focusing on source packages. No leaky restrictions from intermediate formats, and if something needs to be fixed, the user sees exactly what the developer sees.

I haven't followed this completely; I assume the plan is to keep the user-visible parts of setuptools hidden for their eventual deprecation? Because I really don't want people to learn or try to “fix” easy_install whenever it turns out to be broken (or break other things, like the search path).

Rationalizing Python packaging

Posted Oct 16, 2013 22:00 UTC (Wed) by corbet (editor, #1) [Link]

Yes, the plan is to keep setuptools hidden - to an extent. They want to bundle an unmodified version, so easy_install will still be there. Just don't use it :)

Rationalizing Python packaging

Posted Oct 17, 2013 5:37 UTC (Thu) by euske (guest, #9300) [Link] (19 responses)

Personally I don't like a language's packaging system and a Linux distribution's packaging system meddling each other. Isn't this, traditionally, a distro's job? I still don't see how other distros with their own package-signing systems can collaborate with this mechanism. The place where a pip packages is installed should be separated from distro-supplied packages (e.g. /var/lib/pip). And what if every other programming language starts doing a similar thing?

Rationalizing Python packaging

Posted Oct 17, 2013 5:59 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

Basically, all other languages _already_ do something like this. Java has Maven, C# has NuGets, Perl has CPAN and so on.

That's because distros are totally _abysmal_ at that. For example, with Java I often need to package our internal software or modified versions of third-party libraries. It's easy with Maven - just add your repo and off you go.

With Debian? I don't even know where to begin. PPAs help a little, but they are too complicated for that. Oh, and I actually work on Mac OS X and some of our developers prefer Fedora, Gentoo. How do we deal with that?

Rationalizing Python packaging

Posted Oct 17, 2013 9:12 UTC (Thu) by lkundrak (subscriber, #43452) [Link] (3 responses)

I presume your colleagues with Mac OS X on Maven only ever use plain Python code without ever having to depend on a native library. Good for them.

Rationalizing Python packaging

Posted Oct 17, 2013 11:45 UTC (Thu) by dstufft (guest, #93456) [Link]

Generally they'll just install the native code using Homebrew, Mac ports, or Fink and then install the Python code. Not as handy as Linux tools, but on the other hand Linux tooling doesn't work on OSX any ways and even if it did it doesn't work inside a virtual environment. On top of that there are systems that support non Python dependencies (such as Conda) but these are considered alternative installers that people could install using pip if they wish.

System packages have their place, but a fair number of people want their application code managed separately from the system.

Rationalizing Python packaging

Posted Oct 17, 2013 16:07 UTC (Thu) by drag (guest, #31333) [Link]

> I presume your colleagues with Mac OS X on Maven only ever use plain Python code without ever having to depend on a native library. Good for them.

That's a ridiculous thing to say. Pip handles compiling against native code just fine even though it's opaque about errors when you are missing dependencies. Even then it's still a big win over distribution packaged versions.

Rationalizing Python packaging

Posted Oct 17, 2013 16:42 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

We have several libraries (like gevent) that require compilation. Pip deals with that just fine.

Maven doesn't deal with compilation, but it can be used to get pre-compiled native artifacts.

Rationalizing Python packaging

Posted Oct 17, 2013 12:39 UTC (Thu) by epa (subscriber, #39769) [Link] (1 responses)

If I need to install a Perl module from CPAN I package it into an RPM using a tool such as cpanspec. The source RPM and the built binary are both kept in version control. Then you can use the standard package management tools for upgrading, installing a new machine and so on. There are equivalents for Debian, etc.

I have to admit that for purely internal code I don't bother with packaging up as modules but instead just keep it in version control and push out new versions to the deployment directory as needed.

Rationalizing Python packaging

Posted Oct 17, 2013 15:18 UTC (Thu) by raven667 (subscriber, #5198) [Link]

This is what I do too although following deep CPAN dependency trees and running cpan2rpm on everything is tedious at best, madness comes from mixing and matching between OS packaged and language-packaged software. Either make OS packages and use the OS packaged language runtime or build and install the language runtime in /opt or whatever and mange it outside the OS entirely.

I will also say that the metadata that such tools as CPAN, distutils, probably Maven and others use is translatable between packaging systems. It seems to me that it would be most productive to concentrate on the machine translation bits so that you can build good OS-native packages (RPM, DEB, ebuild, whatever.) just using the metadata that the language runtime already has. If that works well enough then even the runtime distributed by the OS vendor could be built using the automated tools without hand modification so that when someone else wants to add or change packages they don't run into conflicts, for example incomplete or incompatible dependency information between CPAN and RPM for the same software.

Rationalizing Python packaging

Posted Oct 17, 2013 8:56 UTC (Thu) by garrison (subscriber, #39220) [Link] (7 responses)

I agree that it is annoying to have the distro and language packaging mechanisms overlapping. For this reason, most people using pip are actually using it inside of a virtualenv so that it is isolated from the system.

Rationalizing Python packaging

Posted Oct 17, 2013 13:29 UTC (Thu) by cortana (subscriber, #24596) [Link]

At least in Debian, they don't overlap in too-bad a way. Debian packages of python modules go to /usr/lib/python2.7/dist-packages; anything from pip, or using distutils, etc., installs modules to /usr/local/lib/python2.7/dist-packages. So at least you don't have to worry about the two mechanisms arguing over who owns which files any more.

(As for /usr/local/lib/python2.7/site-packages, in case you wondered, it is not used by the Debian-distributed version of Python at all; it is used when a user installs modules with manually-built Python interpreter).

Rationalizing Python packaging

Posted Oct 17, 2013 16:04 UTC (Thu) by drag (guest, #31333) [Link] (5 responses)

> I agree that it is annoying to have the distro and language packaging mechanisms overlapping. For this reason, most people using pip are actually using it inside of a virtualenv so that it is isolated from the system.

This.

I use 'virtualenv' + 'pip' and it SIGNIFICANTLY increased the utility of Python for me.

Major advantages of doing 'sandboxed' approach:

* All the systems I use will end up using as close to the same software as possible. I deal with various versions of Redhat, my Fedora desktop, and a couple Debian/Ubuntu machines. Using PIP + Virtualenv ensures that the software matches all these machines as closely as possible.

* The majority of Python libraries and whatnot are not packaged by distributions. So if I tried to use it I would be forced to manage a huge amount of the software manually or through ad-hoc scripts and try to manage all that stuff manually.

* For the software that distributions DO actually package they have random versions that are usually going to be mostly worthless for what I need to do.

* I can trivially setup multiple 'sandboxed' environments for different projects. I don't have to use full fledged virtual machines. I can have a different project and different set of software enabled for different terminals. It's extremely easy and, much more, it's extremely _quick_. Do not underestimate the utility of being able to do things very quickly or easily. This leads to allowing you do to new things that are not possible with a more cumbersome setup.

* I can maintain a particular environment for a particular application that I've worked on. If I want to update my default virtualenv that I use for ipython and such things then I don't have to worry about that breaking some old Flask project I haven't worked on in months.

That's the stuff I can think of off the top of my head. The 'virtualenv' and 'pip' approach is a huge win. It's too bad that this sort of thing is available for everything. Even if you don't have a easy way to 'sandbox' things there are huge advantages in being able to use alternative packaging schemes for commonly used Linux software.

For example, If I had to depend on apt-get or yum to install Emacs modules for me I would never actually have used Emacs. Emacs would be almost completely unusable.

Every time I tried to use that editor in the past I would get thwarted by the differences in different Linux distributions I would be forced to have to use it on. The Emacs versions themselves, the modules available, and the versions available on different systems would be virtually at random. And it's not even a issue of 'Use tramp' (which I do use all the time), 'put emacs in tmux/screen'(GUI version of Emacs significantly better then terminal version), or 'use Emacs over X over ssh'(X networking is miserable).

I use different workstations. I want to use it on my work laptop, my work workstation, my home desktop, my home laptop, and my backup remote system I have setup for work when I can't get to any of the above. Even getting really decent python support for Emacs was pretty miserable.

Nowadays the most painful thing to deal with is installing Emacs if the distribution doesn't offer a new enough version. I can us e git for the actual configs and make sure they are available on all the systems and I can depend on Melpa and Emacs package management to automatically setup everything the as close to the same way on all my systems as possible. If I have a issue I can google it and even if the blog post or question gets asked/answered for a user on a Windows machine or a OS X system I can still follow along on my Linux systems with the same commands and whatnot and it will usually work out fine.

Nowadays people new to Emacs can be told: "Install Emacs. Make sure it's 24 or newer. Make sure you have 'package' enabled, add gnu, marmalade, and melpa repos, and then install the emacs-starter-kit from that and you are about 90% the way there." I don't have to go into specifics on whether they are using Ubuntu, Debian, Redhat, Fedora, or Arch. It's irrelevant. Even the same advice applies mostly the same to people using OS X and Windows.

Something like that isn't even remotely possible if you try to depend on a Linux distribution's provided emacs modules. I run into the same issues with Java or Perl or whatever else I try to use on Linux. It's almost always a huge advantage to ignore the distribution packaged versions of software as much as possible, unless you are dealing with major popular things and that sort of thing.

Rationalizing Python packaging

Posted Oct 17, 2013 17:30 UTC (Thu) by jimparis (guest, #38647) [Link] (4 responses)

I agree with all of your points, but it's also worth pointing out some of the disadvantages. I don't like using different software for dealing with Ruby packages, Python packages, system packages, Emacs packages, Perl packages, or anything else. It is extremely convenient to be able to install, remove, and upgrade them with a single, consistent set of tools and user interface. I can also control the installation of system-wide packages with things like preseeding on the install media, while setting up a sandbox requires additional custom steps outside the scope of most management tools.
Furthermore, maintaining each individual sandbox can be a pain. All distro-supplied software on my system gets automatic security updates, but I need to remember to manually go into any sandboxed environments and do manual updates there as well.

In general, my Python use does not require extremely specific versions of modules, and so I use distro packages when I can, resorting to virtualenv only when needed.

Rationalizing Python packaging

Posted Oct 17, 2013 17:56 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (2 responses)

Maybe RPM/dpkg should have plugin infrastructure built which trigger on certain package regexes (rubygem-$gem or python-pip-$package) which does automatic gem or pip gymnastics to get a system version?

Rationalizing Python packaging

Posted Oct 17, 2013 18:31 UTC (Thu) by utoddl (guest, #1232) [Link] (1 responses)

Could you not teach PackageKit to do this?

Rationalizing Python packaging

Posted Oct 17, 2013 19:52 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Probably, but the core packaging tool might want to know what else is poking around in /usr (or you could just jail pip and friends to /usr/local (or even /usr/local/$language)).

Rationalizing Python packaging

Posted Oct 17, 2013 18:15 UTC (Thu) by drag (guest, #31333) [Link]

The ultimate solution is to have a distributed, standardized and distribution agnostic method for downloading and sandboxing software.

That is instead of building and packaging Linux software to the main directory tree and not building software for Linux distributions there should be a place that developers can make available software available to all Linux users regardless of what distribution they are using and install it in a manner that each piece of software can be managed independently.

Rationalizing Python packaging

Posted Oct 17, 2013 11:42 UTC (Thu) by dstufft (guest, #93456) [Link] (3 responses)

Pip just asks Python where to install to. On Debian based systems this means that pip *does* have a separate location. However there is still some meddling as pip will notice a version installed by the distro and uninstall it (without telling the distribution it did this). I have it on my personal TODO list to write a PEP to make the distribution packaging and pip play nicer together.

As someone else mentioned this is only an issue when installing globally and is not an issue at all when installing into a virtual environment.

Rationalizing Python packaging

Posted Oct 19, 2013 15:23 UTC (Sat) by ndye (guest, #9947) [Link] (2 responses)

On Debian based systems this means that pip *does* have a separate location. However there is still some meddling as pip will notice a version installed by the distro and uninstall it (without telling the distribution it did this).

Really?  Why does pip feel the need to uninstall a distro-managed module?  Doesn't virtualenv or pip have something like LD_LIBRARY_PATH to tell a local app to use the pip-provided module instead of the distro's module?

(Sorry, perl's my thing, so I'm confessing complete ignorance of the Python framework.

Rationalizing Python packaging

Posted Oct 23, 2013 17:31 UTC (Wed) by intgr (subscriber, #39733) [Link] (1 responses)

Normally when using pip you would manage Python packages in a separate virtualenv, in which case your distro packages are safe. The virtualenv can also optionally inherit system packages, but won't touch them.

This problem only happens when you use pip as root outside of a virtualenv. Anyone doing that needs some cluebat treatment.

Rationalizing Python packaging

Posted Oct 23, 2013 18:10 UTC (Wed) by nix (subscriber, #2304) [Link]

... or is running their own Python installation (perhaps in a nonstandard prefix), in which case pip is the right way to manage it. :)


Copyright © 2013, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds