Python 3.2: toward the future of the language
Since Python is under a moratorium on the addition of new language features, one might think that a new release - even a major release - would be relatively boring. But the moratorium only applies to the core language; the libraries - which is where much of the interesting action is to be found - are unaffected. A look at the What's new in Python 3.2 document indicates that the libraries are evolving quickly indeed. Some of the more significant changes include:
- A new "argparse" module for the handling of command-line options.
Those of us still using getopt have been left far behind; the current
"optparse" module has also been deprecated as of version 2.7. Argparse
would appear to go beyond mundane argument parsing into the creation
of command-line languages. It can probably handle more details than
most people will ever want to use.
- There is an ongoing effort to gather concurrency-related modules under
the "concurrent" namespace. The first addition there is concurrent.futures, a
mechanism for the submission and management of tasks in
multi-threaded and multi-process environments.
- The handling of compiled .pyc files has changed to reflect an
environment where multiple Python runtimes coexist. They now have the
interpreter name and version built into their names and have been
banished into a separate __pycache__ directory. There is a
similar mechanism for the handling of shared libraries.
- Many other modules have seen significant improvements; see the "what's new" document for details.
A couple of the most significant improvements may be elsewhere, though. One of those is the definition of a stable ABI for extension modules. Anybody who has been through a Python version update knows that the associated rebuilding of extension modules is not a lot of fun. As of version 3.2, modules which restrict themselves to a subset of the extension module ABI should continue to work indefinitely into the future. It's not yet clear how many real-world modules can live within the restrictions of this ABI; also unclear is how much that ABI could be extended without slowing further development of the language. But it's a step in the right direction toward the solution of a real problem.
Another partial solution to an ongoing problem can be found in the rewrite of the global interpreter lock (GIL). The GIL is Python's equivalent to the kernel's Big Kernel Lock; it ensures that only one thread can be executing in the bytecode interpreter at any given time. Since running bytecode is what Python programs do, the GIL can be seen as a rather significant constraint on how much concurrency is possible in a multi-threaded environment. Some extension modules release the GIL while they are doing extensive computations, and the GIL (like the BKL) is released while waiting for I/O, but that doesn't solve the real problem. The failure to remove (or at least reduce the role of) the GIL during the Python 3 development process is, for many developers, one of the biggest disappointments of Python 3.
The 3.2 GIL rewrite does not change the fundamental nature of the GIL, but it does reduce its impact somewhat. As described by Antoine Pitrou, the principal hacker behind this work, two significant changes have been made:
- Previously, the GIL would be passed from one contending thread to the
next after a certain number of opcodes had been executed. But opcodes
do not execute in constant time, and some of them (such as calls into
an extension module) can execute for a long time indeed. The new GIL
is, instead, passed on after a bounded time period (5ms by default).
- The GIL is implemented in an inherently unfair manner; once it has been released, any process which comes along can claim it. Prior to 3.2, that "any process" is often the process which just released the lock. That process is supposed to wait before attempting to reacquire the GIL, but the fact that it is running and cache-hot means it's still likely to get there first. The new GIL is still unfair, but it will at least force the releasing process to wait until a contending process has acquired the lock. That should fix some of the long latencies seen by Python programmers in some situations.
Given the scalability limitations inherent in a single, global lock, one might think that eliminating that lock would be a priority for the Python developers. The Python glossary suggests that this isn't the case:
The addition of fine-grained locking which did not hurt single-threaded code could certainly be a bit of work; it might well involve techniques like run-time patching of the interpreter. For a system which is supposed to run on many operating systems, such a solution could indeed be brittle and hard to maintain. In its absence, though, the scalability of multi-threaded Python programs will continue to be limited.
That said, Python 3 is clearly getting better. Over time, adoption appears
to be on the increase; the number of distributions and modules which
support the language is growing. Python 3 continues to be a
sufficiently hard sell that a group of developers recently contemplated
reopening feature-oriented development on version 2.x, but that idea fell
by the wayside when it became clear that the developer interest wasn't
there. Python 3 thus appears to be the future for those who want a
language which continues to evolve. Based on what can be seen in the 3.2
release, that evolution is going full speed, even in the face of a
moratorium on new core features.