Python 3.2: toward the future of the language

By Jonathan Corbet
February 23, 2011

The Python 3.2 release was announced on February 20, exactly 20 years after 0.9.0, which was the first public Python release. Given that Python 2.x remains the version of the language used by most programmers and most existing code, one might be tempted to write off this release as being relatively unimportant. But the 3.2 release has some changes which will be important to Python developers going forward, so, even if one isn't planning on moving to Python 3 right away, this release merits a quick look.

Since Python is under a moratorium on the addition of new language features, one might think that a new release - even a major release - would be relatively boring. But the moratorium only applies to the core language; the libraries - which is where much of the interesting action is to be found - are unaffected. A look at the What's new in Python 3.2 document indicates that the libraries are evolving quickly indeed. Some of the more significant changes include:

A new "argparse" module for the handling of command-line options. Those of us still using getopt have been left far behind; the current "optparse" module has also been deprecated as of version 2.7. Argparse would appear to go beyond mundane argument parsing into the creation of command-line languages. It can probably handle more details than most people will ever want to use.
There is an ongoing effort to gather concurrency-related modules under the "concurrent" namespace. The first addition there is concurrent.futures, a mechanism for the submission and management of tasks in multi-threaded and multi-process environments.
The handling of compiled .pyc files has changed to reflect an environment where multiple Python runtimes coexist. They now have the interpreter name and version built into their names and have been banished into a separate __pycache__ directory. There is a similar mechanism for the handling of shared libraries.
Many other modules have seen significant improvements; see the "what's new" document for details.

A couple of the most significant improvements may be elsewhere, though. One of those is the definition of a stable ABI for extension modules. Anybody who has been through a Python version update knows that the associated rebuilding of extension modules is not a lot of fun. As of version 3.2, modules which restrict themselves to a subset of the extension module ABI should continue to work indefinitely into the future. It's not yet clear how many real-world modules can live within the restrictions of this ABI; also unclear is how much that ABI could be extended without slowing further development of the language. But it's a step in the right direction toward the solution of a real problem.

Another partial solution to an ongoing problem can be found in the rewrite of the global interpreter lock (GIL). The GIL is Python's equivalent to the kernel's Big Kernel Lock; it ensures that only one thread can be executing in the bytecode interpreter at any given time. Since running bytecode is what Python programs do, the GIL can be seen as a rather significant constraint on how much concurrency is possible in a multi-threaded environment. Some extension modules release the GIL while they are doing extensive computations, and the GIL (like the BKL) is released while waiting for I/O, but that doesn't solve the real problem. The failure to remove (or at least reduce the role of) the GIL during the Python 3 development process is, for many developers, one of the biggest disappointments of Python 3.

The 3.2 GIL rewrite does not change the fundamental nature of the GIL, but it does reduce its impact somewhat. As described by Antoine Pitrou, the principal hacker behind this work, two significant changes have been made:

Previously, the GIL would be passed from one contending thread to the next after a certain number of opcodes had been executed. But opcodes do not execute in constant time, and some of them (such as calls into an extension module) can execute for a long time indeed. The new GIL is, instead, passed on after a bounded time period (5ms by default).
The GIL is implemented in an inherently unfair manner; once it has been released, any process which comes along can claim it. Prior to 3.2, that "any process" is often the process which just released the lock. That process is supposed to wait before attempting to reacquire the GIL, but the fact that it is running and cache-hot means it's still likely to get there first. The new GIL is still unfair, but it will at least force the releasing process to wait until a contending process has acquired the lock. That should fix some of the long latencies seen by Python programmers in some situations.

Given the scalability limitations inherent in a single, global lock, one might think that eliminating that lock would be a priority for the Python developers. The Python glossary suggests that this isn't the case:

Past efforts to create a "free-threaded" interpreter (one which locks shared data at a much finer granularity) have not been successful because performance suffered in the common single-processor case. It is believed that overcoming this performance issue would make the implementation much more complicated and therefore costlier to maintain.

The addition of fine-grained locking which did not hurt single-threaded code could certainly be a bit of work; it might well involve techniques like run-time patching of the interpreter. For a system which is supposed to run on many operating systems, such a solution could indeed be brittle and hard to maintain. In its absence, though, the scalability of multi-threaded Python programs will continue to be limited.

That said, Python 3 is clearly getting better. Over time, adoption appears to be on the increase; the number of distributions and modules which support the language is growing. Python 3 continues to be a sufficiently hard sell that a group of developers recently contemplated reopening feature-oriented development on version 2.x, but that idea fell by the wayside when it became clear that the developer interest wasn't there. Python 3 thus appears to be the future for those who want a language which continues to evolve. Based on what can be seen in the 3.2 release, that evolution is going full speed, even in the face of a moratorium on new core features.

to post comments

Python 3.2: toward the future of the language

Posted Feb 24, 2011 2:54 UTC (Thu) by kragilkragil (guest, #72832) [Link] (28 responses)

I am sorry, but Python3 is a big letdown for me. I really hoped that they would improve the core of the interpreter. If you break backwards compatibility you should have also removed the most obvious performance killers, right?

The way they did it was to break compatibility with no performance gain for a slightly better syntax and a few enhancements. Not enough IMO.

I wouldn't be surprised if languages that better deal with multi-core systems and have better performance like Go or Scala will eat a lot of Pythons lunch in the application space.

Python 3.2: toward the future of the language

Posted Feb 24, 2011 4:14 UTC (Thu) by k8to (guest, #15413) [Link] (27 responses)

Multithreaded code is the devil. I work with a highly multithreaded codebase right now where any number of concurrency problems abound. It's seriously harder than anyone typically will admit to themselves.

We're in the process of slowly, painfully, breaking out concurrent activities into seperate processes, which is really fundamentally necessary for memory allocation and robustness reasons.

It's not really that much harder (often easier) to design a multiprocess program. It's definitely harder to maintain a multithreaded program.

As for the performance issue: yeah, python is still slow. I don't think you can really get a much better performing python without fundamentally changing the approach to the interpreter (pypy) or fundamentally changing the language (shedskin).

I'm also disappointed with python3's changes. I think just language cleanups would have been reasonable, but I don't think enough was cleaned up to make it worth the switch. The only really huge one is strings and unicode, which isn't sexy but is really important.

However, the ability to performantly have lots of threads isn't even on the rardar. If you're trying to do high performance computing, you probably shouldn't be in python in the first place. If your problem cannot be addressed in multiple processes, it probably shouldn't be in python either.

Python 3.2: toward the future of the language

Posted Feb 24, 2011 5:57 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (9 responses)

Good heavens, yes. The number of times I have run into Python programmers who complain about the GIL, and stare blankly when I say: "Umm, fork? IPC?"

What are they teaching them these days?

Python 3.2: toward the future of the language

Posted Feb 24, 2011 10:26 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

Quite often the need to marshal/unmarshal data for IPC eats all advantages of multithreading.

Python 3.2: toward the future of the language

Posted Feb 24, 2011 18:13 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (5 responses)

Assuming that you use message passing IPC, perhaps. But there's also shared memory, which does not have that disadvantage.

Python 3.2: toward the future of the language

Posted Feb 24, 2011 20:37 UTC (Thu) by ballombe (subscriber, #9523) [Link] (4 responses)

But shared memory reintroduces all the hard problems of multi-threading.

Python 3.2: toward the future of the language

Posted Feb 24, 2011 21:45 UTC (Thu) by oak (guest, #2786) [Link]

With process model the developer tends to (has to) think more about isolation of processing beforehand, so it gives a mindset where you don't end up as easily with with the debugging hell you so often have when trying to understand/fix (somebody else's) bad code that uses threads.

Python 3.2: toward the future of the language

Posted Feb 24, 2011 21:53 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (2 responses)

No, there's a big difference: with threading, everything defaults to shared. With explicit shared memory, everything defaults to unshared, and you explicitly select that which you want to share. So, you can much more easily reason about protecting access to the shared resource.

Python 3.2: toward the future of the language

Posted Mar 3, 2011 1:36 UTC (Thu) by jlokier (guest, #52227) [Link]

Which is how it's done in Perl threads. Should I use Perl? :-)

Python 3.2: toward the future of the language

Posted Mar 3, 2011 9:48 UTC (Thu) by renox (guest, #23785) [Link]

>No, there's a big difference: with threading, everything defaults to shared.

The OS view of the memory defaults to shared, yes, but this doesn't mean that the language has to do this also: D doesn't for example: its default is thread local storage:
http://www.digitalmars.com/d/2.0/migrate-to-shared.html

Python 3.2: toward the future of the language

Posted Feb 28, 2011 10:09 UTC (Mon) by daniel_svensson (guest, #28746) [Link] (1 responses)

A piece of software I've been working on for quite some time was originally written in Python. We struggled with GIL quite a bit. At the end we were using the multiprocessing package and it worked pretty good with 8 to 50 processes depending on configuration.. we were able to use about 300-400% of CPU on our 8 core server, but the throughput was still pathetic, and we suffered long startup times due to having to copy data between processes.

An interesting fact was that disabling the Python garbage collection, after getting rid of the small number of circular references we had, increased throughput of multiprocessing module (pickle) by about 20 times. This also got rid of 10 second stalls every GC (huge heap).

In the end we migrated pretty much the same architecture to Java under the JVM, and increased throughput by 10-40 times depending on configuration. It's sad that Python doesn't perform better in scenarios where you have a problem that scales concurrently, and you have a fat server with lots of cores/cpus. Parts of our team feel uncomfortable with the Java language, but none of us can ignore that the JVM is crazy awesome technology compared to the cpython runtime.

Python 3.2: toward the future of the language

Posted Feb 28, 2011 13:00 UTC (Mon) by flewellyn (subscriber, #5047) [Link]

So...why not use Jython, then?

Use asynchronous messaging instead of threads

Posted Feb 24, 2011 8:15 UTC (Thu) by Cato (guest, #7643) [Link] (8 responses)

Because of the pain of getting threaded code to work correctly, this is not just a Python issue - PHP, Ruby and others are in the same boat (some Java-based interpreters for Ruby etc do support threading but you still have to get your threaded code to work).

One interesting technique to avoid having to use multithreading within an interpreter is to connect multiple processes with lightweight messaging. ZeroMQ is particularly lightweight (well below 100 microseconds round trip for a hello-world example in C using sockets) and easy to use from many scripting languages as well as C/C++/OCaml etc (see https://lwn.net/Articles/393235/ for more on ZeroMQ).

Once you have async messaging between processes, the overhead of writing socket servers goes away, and more importantly you don't need to write any locking code so that threading works.

You can also more than one language if required, e.g. Python for most purposes and OCaml for performance-sensitive requirements (since its performance is competitive with C, without the memory usage of Java, and is still very high level).

More on messaging vs. threads from the ZeroMQ developers: http://www.zeromq.org/blog:multithreading-magic

Use asynchronous messaging instead of threads

Posted Feb 28, 2011 18:02 UTC (Mon) by NAR (subscriber, #1313) [Link] (1 responses)

Once you have async messaging between processes, the overhead of writing socket servers goes away, and more importantly you don't need to write any locking code so that threading works.

That's the promise of Erlang. However, in my experience, this just puts the problem one abstraction level up - there's no problem with shared memory, but there is still problem with shared state. At least it doesn't crashes, it just deadlocks.

Use asynchronous messaging instead of threads

Posted Mar 3, 2011 11:22 UTC (Thu) by Cato (guest, #7643) [Link]

Good point - but then any shared state, even if mediated by a ZeroMQ connected state server, must have locks, just as MySQL has locks while connected by sockets. I think the ZMQ approach is that you do as much as possible with non-session-based messages that are idempotent - for these, any locks are internal to the 'state server' and if it's in a single thread/process that owns the state, there's no need for locks (e.g. if you can dispatch each request by hasing to a server process that owns the data for such requests).

Use asynchronous messaging instead of threads

Posted Mar 2, 2011 16:33 UTC (Wed) by njs (subscriber, #40338) [Link] (3 responses)

I've wanted to use ZeroMQ, but last I checked, if you use their library then you have to use their event loop. So if you want ZMQ and a GUI, then you have to either use threads to run both event loops, or else run a busy loop polling both every few dozen milliseconds. (The new version of IPython uses ZMQ, and it takes the polling approach :-(.)

They really need a lower-level API, where their library tells you what sockets need to be monitored for IO, and then lets you take care of making that happen. (Even if that might add a few microseconds of overhead.) Until that happens I can't in good conscience use it.

Use asynchronous messaging instead of threads

Posted Mar 3, 2011 11:26 UTC (Thu) by Cato (guest, #7643) [Link] (2 responses)

Worth raising on the ZeroMQ email list. You can poll on multiple FDs including ZeroMQ sockets and non-ZeroMQ FDs, but I guess that's what you mean by using their event loop.

There are people who've done integrations of ZeroMQ to various evented / reactor style frameworks for Ruby, Lua, etc, and there's also a Qt integration that merges with Qt's event loop: http://www.zeromq.org/docs:labs

Use asynchronous messaging instead of threads

Posted Mar 3, 2011 15:39 UTC (Thu) by njs (subscriber, #40338) [Link] (1 responses)

I see, yeah, it looks like they fixed this in 2.1.0 and then hid the docs really well (inside the zmq_getsockopt man page). Good to hear, thanks!

Use asynchronous messaging instead of threads

Posted Mar 3, 2011 22:25 UTC (Thu) by Cato (guest, #7643) [Link]

Ah yes, the ZMQ_FD option in http://api.zeromq.org/master:zmq-getsockopt

Use asynchronous messaging instead of threads

Posted Mar 3, 2011 9:53 UTC (Thu) by renox (guest, #23785) [Link] (1 responses)

But if you have developers who know OCaml, why wouldn't you write everything in OCaml instead of using a mix of Python and OCaml?

I'm not trolling: that's an honest question..

Use asynchronous messaging instead of threads

Posted Mar 3, 2011 11:35 UTC (Thu) by Cato (guest, #7643) [Link]

Without wanting to start a language war, as I like both languages...

Some of the strengths of Python vs. OCaml: very wide adoption, many third party and core libraries, many application frameworks, IDE support, tools support (unit testing, continuous integration, etc), easy to apply OO design patterns, etc. (Some of these things probably exist in OCaml but there would be fewer options and less rich support typically.) Most of these come down to "Python has been around quite a long time and is widely used".

OCaml's strengths include 'if it compiles it will usually work' due to its type system, very good native code performance without language restrictions, reasonable memory use, strong functional support, etc. The learning curve is a bit steeper than Python but then it lets you do more once you understand functional programming. The build system is pretty horrible in my experience - a confusing set of closely related files to be built and managed, and the docs aren't very good on that part.

So you might choose to do different parts of a single complex application in Python and OCaml, e.g. the GUI/Web app/mobile app parts in Python (or Ruby/PHP, the arguments are similar), and have the core business/app logic and web services / database in OCaml. ZeroMQ is a good way to connect all this together.

Microsoft F#, which is quite close to OCaml, shows signs of increasing interest in OCaml, simply by bringing the concept of a fast functional language to a very wide audience - you can even get IDE support and commercial training for F# which I haven't seen for OCaml.

Multiprocessing to the rescue

Posted Feb 24, 2011 9:27 UTC (Thu) by smurf (subscriber, #17840) [Link] (4 responses)

There's a nice "multiprocessing" module which tries very hard to be a drop-in replacement for "threading". If that's too slow or too restrictive, there are numerous other libraries which one can investigate.

In any case, the all-too-common GIL complaints are beside the point. CPython is a reference-counted implementation of the language, and that's unlikely to change. You cannot keep reference counters consistent without locking.
Protecting every single data structure with its personal lock is inefficient in both time and space; there simply is no way around that fact.

Multiprocessing to the rescue

Posted Feb 24, 2011 11:28 UTC (Thu) by johill (subscriber, #25196) [Link] (3 responses)

It seems to my if reference counting was the only problem you'd use atomic_t variables and be done with it, so either I'm missing something (tell me what?) or it's not quite as simple.

Multiprocessing to the rescue

Posted Feb 24, 2011 12:19 UTC (Thu) by smurf (subscriber, #17840) [Link]

You'd also need locking around accesses to dictionary contents, object attribute accesses, and whatnot.

Consider what might happen when thread A deletes an attribute while B does a lookup on the same thing.
B finds/dereferences the entry, but -- just before it gets around to actually incrementing the refcount -- A comes along, decreases the counter to zero, decides the value can be freed and happily proceeds to do so while B increments the counter -- and then proceeds to trample all over freed memory. Or worse.
Dealing with this by read-locking every single attribute lookup, dictionary and list access (even for reading!) is just too expensive.

The kernel deals with this problem by way of RCU (Read-Copy-Update). My (admittedly cursory) look at the Python internals tells me that converting the CPython runtime to RCUs would be a multi-month project, with nontrivial impact on the external module API -- and I didn't even look at the thread sync problem, which the kernel is able to deal with by way of its comparatively simple "wait until everybody has called schedule() once" method.

Another problem is that some of these counters are heavily contended. An atomic_t would bounce around CPUs like crazy.

Multiprocessing to the rescue

Posted Feb 24, 2011 16:55 UTC (Thu) by dlang (guest, #313) [Link]

there is a large amount of overhead in using atomic variables, so it's not something that you want to do by default (even ignoring the fact that changing one variable in a structure, or bytes in a string one at a time probably won't make sense)

Multiprocessing to the rescue

Posted Feb 28, 2011 9:50 UTC (Mon) by jamesh (guest, #1159) [Link]

You are right that more than just the reference counts are protected by the GIL. Many types rely on the GIL to serialise access to their internal data structures.

For example, if you append an item to a list and it needs to reallocate its internal array, the GIL makes sure no other threads try to read the freed memory. You'll find similar assumptions in other mutable data types.

Python 3.2: toward the future of the language

Posted Feb 24, 2011 17:00 UTC (Thu) by jthill (subscriber, #56558) [Link] (2 responses)

CCP seems to be doing pretty well on multithread performance with stackless. 'Course, coroutines count as a fundamental change, but it seems to me that concept has been nearly anathema for a long time. Anybody have a short answer for "why?"?

Python 3.2: toward the future of the language

Posted Mar 1, 2011 4:50 UTC (Tue) by i3839 (guest, #31386) [Link]

Because coroutines are great for multitasking on one CPU, bur rubbish for multitasking on multiple CPUs? That is, they coordinate CPU sharing, but not data sharing between multiple CPUs.

Python 3.2: toward the future of the language

Posted Mar 3, 2011 11:57 UTC (Thu) by daishan (guest, #47363) [Link]

CCP is far from doing well. They trying to get their hands on processors with a high single core performance to keep up with increasing numbers of players gathering in a single place.

They've divided the game world in to solar systems and every server process on their cluster controls one or more solar systems. When a large number of players gather in a single system (for a large battle or because it is a popular trade hub) the process that is responsible for that solar system is limited to a single CPU core and can't keep up. They split out many auxiliary functions (market transactions for example) to special dedicated nodes, but the basic interactions between players (primarily moving and shooting each other) are hard to divide into multiple processes.

I don't know if this problem would be easily resolvable if python wouldn't have the GIL and could make use of multiple cores simultaneously by multithreading, but it seems that at least multiprocessing isn't the answer in this case.

Python 3.2: toward the future of the language

Posted Feb 24, 2011 4:57 UTC (Thu) by zooko (guest, #2589) [Link]

Python 3 thus appears to be the future for those who want a language which continues to evolve.

I think it remains to be seen. To me, the most promising upgrade from CPython 2.7 currently appears to be PyPy, which is already faster than CPython on many measurements and is actively developed.

Maybe by the time I need to upgrade away from CPython 2.7, PyPy will support Python 3. Or maybe not. Maybe instead Python 3 development will have trailed off and all the hot new features will be available in PyPy in Python 2.

Python 3.2: toward the future of the language

Posted Feb 26, 2011 0:42 UTC (Sat) by robinson (guest, #4830) [Link]

Thank you for this article and last week's article on Python 3. As a Py2 developer I have held off on Py3 due to not understanding what changes were going to be required in my code. These two articles have been very informative.