Leading items

Welcome to the LWN.net Weekly Edition for March 5, 2020

This edition contains the following feature content:

The costs of continuous integration: freedesktop.org's continuous-integration system has been a huge success, but can the project afford to keep it running?
Attestation for kernel patches: a proposal for a way to secure the provenance of patches for the kernel.
An end to high memory?: a technique needed to make 32-bit systems work is reaching the end of its welcome in the kernel.
Unexporting kallsyms_lookup_name(): a proposal to remove a way to circumvent the kernel's module symbol system, coming from what some might see as a surprising source.
Python time-zone handling: adding symbolic time zones to Python's datetime class is not as easy as it might seem.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

The costs of continuous integration

By Jake Edge
March 4, 2020

By most accounts, the freedesktop.org (fd.o) GitLab instance has been a roaring success; lots of projects are using it, including Mesa, Linux kernel graphics drivers, NetworkManager, PipeWire, and many others. In addition, a great deal of continuous-integration (CI) testing is being done on a variety of projects under the fd.o umbrella. That success has come at a price, however. A recent message from the X.Org Foundation, which merged with fd.o in 2019, has made it clear that the current situation is untenable from a financial perspective. Given its current resources, X.Org cannot continue covering those costs beyond another few months.

X.Org board member Daniel Vetter posted a message to multiple mailing lists on February 27. In it, he noted that the GitLab instance has become quite popular, including the CI integration, which is good news. But there is some bad news to go with it:

The cost in growth has also been tremendous, and it's breaking our bank account. With reasonable estimates for continued growth we're expecting hosting expenses totalling 75k USD this year, and 90k USD next year. With the current sponsors we've set up we can't sustain that. We estimate that hosting expenses for gitlab.fd.o without any of the CI features enabled would total 30k USD, which is within X.org's ability to support through various sponsorships, mostly through XDC.

The expense growth is mainly from "storing and serving build artifacts and images to outside CI runners sponsored by various companies". Beyond that, all of that growth means that a system administrator is needed to maintain the infrastructure, so "X.org is therefore also looking for admin sponsorship, at least medium term". Without more sponsors for the costs of the CI piece, it looks like those service would need to be turned off in May or June, he said. The board is working on finding additional sponsorship money, but that takes time, so it wanted to get the word out.

That set off a discussion of the problem and some possible solutions. Matt Turner was concerned that the bandwidth expense had not been adequately considered when the decision was made to self-host the GitLab instance. "Perhaps people building the CI would make different decisions about its structure if they knew it was going to wipe out the bank account." He wondered if the tradeoff was worth the cost:

I understand that self-hosting was attractive so that we didn't find ourselves on the SourceForge-equivalent hosting platform of 2022, but is that risk real enough to justify spending 75K+ per year? If we were hosted on gitlab.com or github.com, we wouldn't be paying for transferring CI images to CI test machines, etc, would we?

Daniel Stone, one of the administrators for the fd.o infrastructure (who gave a talk on the organization's history at the 2018 X.Org Developers Conference), filled in some of the numbers for the costs involved. He said that the bill for January was based on 17.9TB of network egress (mostly copying CI images to the test-running systems) and 16TB of storage for "CI artifacts, container images, file uploads, [and] Git LFS". That totaled to almost $4,000, so the $75,000 projection takes into account further growth. In a follow-up message, he detailed the growth as well:

For context, our storage & network costs have increased >10x in the past 12 months (~$320 Jan 2019), >3x in the past 6 months (~$1350 July 2019), and ~2x in the past 3 months (~$2000 Oct 2019).

Stone also noted that the Google Cloud Platform (where the GitLab instance is hosted) does not provide all of the platform types needed for running the CI system. For example, Arm-based DragonBoards are needed, so some copying to external testing systems will be required. Using the cloud services means that bandwidth is metered, which is not necessarily true in other hosting setups, such as virtual private servers, as Jan Engelhardt pointed out. That would require more system administration costs, however, which Stone thinks would now make sense:

I do now (personally) think that it's crossed the point at which it would be worthwhile paying an admin to solve the problems that cloud services currently solve for us - which wasn't true before.

Dave Airlie argued that the CI infrastructure should be shut down "until we work out what a sustainable system would look like within the budget we have". He thought that it would be difficult to attract sponsors to effectively pay Google and suggested that it would make more sense for Google to cut out the middleman: "Having google sponsor the credits costs google substantially less than having any other company give us money to do it."

Vetter said that Google has provided $30,000 in hosting credits over the last year, but that money "simply ran out _much_ faster than anyone planned for". In addition, there are plenty of other ways that companies can sponsor the CI system:

Plus there's also other companies sponsoring CI runners and what not else in equally substantial amounts, plus the biggest thing, sponsored admin time (more or less officially). So there's a _lot_ of room for companies like Red Hat to sponsor without throwing any money in google's revenue stream.

The lack of any oversight of what gets run in the CI system and which projects are responsible for it is part of the problem, Airlie said. "You can't have a system in place that lets CI users burn [large] sums of money without authorisation, and that is what we have now." Vetter more or less agreed, but said that the speed of the growth caught the board by surprise, "so we're a bit behind on the controlling aspect". There is an effort to be able to track the costs by project, which will make it easier to account for where the money is going—and to take action if needed.

As part of the reassessment process, Kristian Høgsberg wanted to make sure that the "tremendous success" of the system was recognized. "Between gitlab and the CI, our workflow has improved and code quality has gone up." He said that it would have been hard to anticipate the growth:

[...] it seems pretty standard engineering practice to build a system, observe it and identify and eliminate bottlenecks. Planning never hurts, of course, but I don't think anybody could have realistically modeled and projected the cost of this infrastructure as it's grown organically and fast.

Reducing costs

The conversation soon turned toward how to reduce the cost in ways that would not really impact the overall benefit that the system is providing. There may be some low-hanging fruit in terms of which kinds of changes actually need testing on all of the different hardware. As Erik Faye-Lund put it:

It feels silly that we need to test changes to e.g the i965 driver on dragonboards. We only have a big "do not run CI at all" escape- hatch.

[...] We could also do stuff like reducing the amount of tests we run on each commit, and punt some testing to a per-weekend test-run or [something] like that. We don't *need* to know about every problem up front, just the stuff that's about to be released, really. The other stuff is just nice to have. If it's too expensive, I would say drop it.

There were other suggestions along those lines, as well as discussion of how to use GitLab features to reduce some of the "waste" in the amount of CI testing that is being done. It is useful to look at all of that, but Jason Ekstrand cautioned against getting too carried away:

I don't think we need to worry so much about the cost of CI that we need to micro-optimize to to get the minimal number of CI runs. We especially shouldn't if it begins to impact coffee quality, people's ability to merge patches in a timely manner, or visibility into what went wrong when CI fails.

He continued by noting that more data will help guide the process, but he is worried about the effect on the development process of reducing the amount of CI testing:

I'm just trying to say that CI is useful and we shouldn't hurt our development flows just to save a little money unless we're truly desperate. From what I understand, I don't think we're that desperate yet. So I was mostly trying to re-focus the discussion towards straightforward things we can do to get rid of pointless waste (there probably is some pretty low-hanging fruit) and away from "OMG X.org is running out of money; CI as little as possible". [...]

[...] I'm fairly hopeful that, once we understand better what the costs are (or even with just the new data we have), we can bring it down to reasonable and/or come up with money to pay for it in fairly short order.

Vetter is also worried that it could be somewhat difficult to figure out what tests are needed for a given change, which could result in missing out on important test runs:

I think [it's] much better to look at filtering out CI targets for when nothing relevant happened. But that gets somewhat tricky, since "nothing relevant" is always only relative to some baseline, so bit of scripting and all involved to make sure you don't run stuff too often or (probably worse) not often enough.

In any case, the community is now aware of the problem and is pitching in to start figuring out how best to attack it. Presumably some will also be working with their companies to see if they can contribute as well. Any of the solutions are likely to take some amount of effort both for developers using the infrastructure and for the administrators of the system. GitLab's new open-source program manager, Nuritzi Sanchez, also chimed in; the company is interested in ensuring that community efforts like fd.o are supported beyond just the migration help that was already provided, she said. "We’ll be exploring ways for GitLab to help make sure there isn’t a gap in coverage during the time that freedesktop looks for sponsors."

While it may have come as a bit of a shock to some in the community, the announcement would seem to have served its purpose. The community now has been informed and can start working on the problem from various directions. Given the (too) runaway success of the CI infrastructure, one suspects that a sustainable model will be found before too long—probably well ahead of the (northern hemisphere) summer cutoff date.

Comments (49 posted)

Attestation for kernel patches

By Jonathan Corbet
March 2, 2020

The kernel development process is based on trust at many levels — trust in developers, but also in the infrastructure that supports the community. In some cases, that trust may not be entirely deserved; most of us have long since learned not to trust much of anything that shows up in email, for example, but developers still generally trust that emailed patches will be what they appear to be. In his ongoing effort to bring more security to kernel development, Konstantin Ryabitsev has proposed a patch attestation scheme that could help subsystem maintainers verify the provenance of the patches showing up in their mailboxes.

One might wonder why this work is needed at all, given that email attestation has been widely available for almost exactly as long as the kernel has existed; Phil Zimmermann first released PGP in 1991. PGP (and its successor, GnuPG) have always been painful to use, though, even before considering their interference with patches and the review process in particular; PGP-signed mail can require escaping characters or be mangled by mail transfer agents. It is safe to say that nobody bothers checking the few PGP signatures that exist on patches sent via email.

Ryabitsev's goal is to make attestation easy enough that even busy kernel developers will be willing to add it to their workflow. The scheme he has come up with is, for now, meant for integration with processes that involve using git send-email to send out a set of patches, though it is not tightly tied to that workflow. A developer can add attestation to their process by creating a directory full of patches and sending them out via git send-email in the usual manner; attestation is then done as a separate step, involving an additional email message.

In particular, the developer will run the attest-patches tool found in Ryabitsev's korg-helpers repository. It will look at each patch and split it into three components:

Some patch metadata: specifically the author's name and email address, along with the subject line.
The commit message.
The patch itself.

The tool will use sha256sum to create a separate SHA-256 checksum for each of the three components. The three checksums are then joined, in an abbreviated form, to create a sort of unique ID for the patch that looks like:

    2a02abe0-215cf3f1-2acb5798

The attest-patches tool creates a file containing this "attestation ID", along with the full checksums for all three components:

    2a02abe0-215cf3f1-2acb5798:
      i: 2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e
      m: 215cf3f133478917ad147a6eda1010a9c4bba1846e7dd35295e9a0081559e9b0
      p: 2acb5798c366f97501f8feacb873327bac161951ce83e90f04bbcde32e993865

A block like this is generated for each patch given to attest-patches. The result happens to be a file in the YAML format, but one can live in ignorance of that fact without ill effect. The file is then passed to GnuPG for signing. The final step is to email this file to signatures@kernel.org, where it will appear on a public mailing list; attest-patches can perform this step automatically.

On the receiving end, a reviewer or subsystem maintainer runs get-lore-mbox with the -aA options; -A does not actually exist yet but one assumes it will appear shortly. As the tool creates a mailbox file suitable for feeding to git am, it will verify the attestation for each of the patches it finds. That is done by generating its own attestation ID for each patch, then using that ID to search for messages on the signatures mailing list. If any messages are found, the full checksum for each of the three patch components is checked. The GPG signature in the file is also checked, of course.

If the checks pass — meaning that an applicable signature message exists, the checksums match the patches in question, and the message is signed by a developer known to the recipient — then get-lore-mbox will create the requested mailbox file, adding a couple of tags to each patch describing the attestation that was found. Otherwise the tool will abort after describing where things went wrong.

A test run of the system has already been done; Kees Cook generated an attestation message for this patch series. He said that this mechanism would be "utterly trivial" to add to his normal patch-generation workflow.

Jason Donenfeld, instead, was unconvinced of the value of this infrastructure. He argued that "maintainers should be reading commits as they come in on the mailing list" and that attestation would make the contribution process harder. He asked: "is the lack of signatures on email patches a real problem we're facing?"

Ryabitsev responded that he saw this mechanism as addressing two specific threats:

An "overworked, tired maintainer" may be tempted to perform cursory reviews of patches from trusted developers; attestation at least lets them know that those patches actually came from their alleged author.
Maintainers might diligently review patches arriving in email, then use a tool like get-lore-mbox to fetch those patches for easy application. If lore.kernel.org has been compromised, it could return a modified form of those patches and the maintainer may well never notice. Once again, attestation should block any such attack.

He ended with a hope that the process he has developed is easy enough that developers will actually use it.

Whether that will actually happen remains to be seen. The use of signed tags on pull requests is still far from universal, despite the fact that they, too, are easy to generate and Linus Torvalds requires them for repositories not hosted on kernel.org. Based on past discussions, it seems unlikely that Torvalds will require attestation for emailed patches. So if patch attestation is to become widespread in the kernel community, it will be as a result of lower-level maintainers deciding that it makes sense. Of course, a successful attack could change attitudes quickly.

Comments (11 posted)

An end to high memory?

By Jonathan Corbet
February 27, 2020

This patch from Johannes Weiner seemed like a straightforward way to improve memory-reclaim performance; without it, the virtual filesystem layer throws away memory that the memory-management subsystem thinks is still worth keeping. But that patch quickly ran afoul of a feature (or "misfeature" depending on who one asks) from the distant past, one which goes by the name of "high memory". Now, more than 20 years after its addition, high memory may be brought down low, as developers consider whether it should be deprecated and eventually removed from the kernel altogether.

A high-memory refresher

The younger readers out there may be forgiven for not remembering just what high memory is, so a quick refresh seems in order. We'll start by noting, for the oldest among our readers, that it has nothing to do with the "high memory" concept found on early personal computers. That, of course, was memory above the hardware-implemented hole at 640KB — memory that was, according to a famous quote often attributed to Bill Gates, surplus to the requirements of any reasonable user. The kernel's notion of high memory, instead, is a software construct, not directly driven by the hardware.

Since the earliest days, the kernel has maintained a "direct map", wherein all of physical memory is mapped into a single, large, linear array in kernel space. The direct map makes it easy for the kernel to manipulate any page in the system; it also, on somewhat newer hardware, is relatively efficient since it is mapped using huge pages.

A problem arose, though, as memory sizes increased. A 32-bit system has the ability to address 4GB of virtual memory; while user space and the kernel could have distinct 4GB address spaces, arranging things that way imposes a significant performance cost resulting from the need for frequent translation lookaside buffer flushes. To avoid paying this cost, Linux used the same address space for both kernel and user mode, with the memory protections set to prevent user space from accessing the kernel's portion of the shared space. This arrangement saved a great deal of CPU time — at least, until the Meltdown vulnerability hit and forced the isolation of the kernel's address space.

The kernel, by default, divided the 4GB virtual address space by assigning 3GB to user space and keeping the uppermost 1GB for itself. The kernel itself fits comfortably in 1GB, of course — even 5.x kernels are smaller than that. But the direct memory map, which is naturally as large as the system's installed physical memory, must also fit into that space. Early kernels could only manage memory that could be directly mapped, so Linux systems, for some years, could only make use of a bit under 1GB of physical memory. That worked for a surprisingly long time; even largish server systems didn't exceed that amount.

Eventually, though, it became clear that the need to support larger installed memory sizes was coming rather more quickly than 64-bit systems were, so something would need to be done. The answer was to remove the need for all physical memory to be in the direct map, which would only contain as much memory as the available address space would allow. Memory above that limit was deemed "high memory". Where the dividing line sat depended entirely on the kernel configuration and how much address space was dedicated to kernel use, rather than on the hardware.

In many ways, high memory works like any other; it can be mapped into user space and the recipients don't see any difference. But being absent from the direct map means that the kernel cannot access it without creating a temporary, single-page mapping, which is expensive. That implies that high memory cannot hold anything that the kernel must be able to access quickly; in practice, that means any kernel data structure at all. Those structures must live in low memory; that turns low memory into a highly contended resource on many systems.

64-Bit systems do not have the 4GB virtual address space limitation, so they have never needed the high-memory concept. But high memory remains for 32-bit systems, and traces of it can be seen throughout the kernel. Consider, for example, all of the calls to kmap() and kmap_atomic(); they do nothing on 64-bit systems, but are needed to access high memory on smaller systems. And, sometimes, high memory affects development decisions being made today.

Inode-cache shrinking vs. highmem

When a file is accessed on a Linux system, the kernel loads an inode structure describing it; those structures are cached, since a file that is accessed once will frequently be accessed again in the near future. Pages of data associated with that file are also cached in the page cache as they are accessed; they are associated with the cached inode. Neither cache can be allowed to grow without bound, of course, so the memory-management system has mechanisms to remove data from the caches when memory gets tight. For the inode cache, that is done by a "shrinker" function provided by the virtual filesystem layer.

In his patch description, Weiner notes that the inode-cache shrinker is allowed to remove inodes that have associated pages in the page cache; that causes those pages to also be reclaimed. This happens despite the fact that the inode-cache shrinker has no way of knowing if those pages are in active use or not. This is, he noted, old behavior that no longer makes sense:

This behavior of invalidating page cache from the inode shrinker goes back to even before the git import of the kernel tree. It may have been less noticeable when the VM itself didn't have real workingset protection, and floods of one-off cache would push out any active cache over time anyway. But the VM has come a long way since then and the inode shrinker is now actively subverting its caching strategy.

Andrew Morton, it turns out, is the developer responsible for this behavior, which is driven by the constraints of high memory. Inodes, being kernel data structures, must live in low memory; page-cache pages, instead, can be placed in high memory. But if the existence of pages in the page cache can prevent inode structures from being reclaimed, then a few high-memory pages can prevent the freeing of precious low memory. On a system using high memory, sacrificing many pages worth of cached data may well be worth it to gain a few hundred bytes of low memory. Morton said that the problem being solved was real, and that the solution cannot be tossed even now; "a 7GB highmem machine isn't crazy and I expect the inode has become larger since those days".

The conversation took a bit of a turn, though, when Linus Torvalds interjected that "in the intervening years a 7GB highmem machine has indeed become crazy". He continued that high memory should be now considered to be deprecated: "In this day and age, there is no excuse for running a 32-bit kernel with lots of physical memory". Others were quick to add their support for this idea; removing high-memory would simplify the memory-management code significantly with no negative effects on the 64-bit systems that everyone is using now.

Except, of course, not every system has a 64-bit CPU in it. The area of biggest concern is the Arm architecture, where 32-bit CPUs are still being built, sold, and deployed. Russell King noted that there are a lot of 32-bit Arm systems with more than 1GB of installed memory being sold: "You're probably talking about crippling support for any 32-bit ARM system produced in the last 8 to 10 years".

Arnd Bergmann provided a rather more detailed look at the state of 32-bit Arm systems; he noted that there is one TI CPU that is being actively marketed with the ability to handle up to 8GB of RAM. But, he said, many new Arm-based devices are actually shipping with smaller installed memory because memory sizes up to 512MB are cheap to provide. There are phones out there with 2GB of memory that still need to be supported, though it may be possible to support them without high memory by increasing the kernel's part of the address space to 2GB. Larger systems still exist, he said, though systems with 3GB or more "are getting very rare". Rare is not the same as nonexistent, though.

The conversation wound down without any real conclusions about the fate of high-memory support. Reading between the lines, one might conclude that, while it is still a bit early to deprecate high memory, the pressure to do so will only increase in the coming years. In the meantime, though, nobody will try to force the issue by regressing performance on high-memory systems; the second version of Weiner's patch retains the current behavior on such machines. So users of systems needing high memory are safe — for now.

Comments (36 posted)

Unexporting kallsyms_lookup_name()

By Jonathan Corbet
February 28, 2020

One of the basic rules of kernel-module development is that modules can only access symbols (functions and data structures) that have been explicitly exported. Even then, many symbols are restricted so that only modules with a GPL-compatible license can access them. It turns out, though, that there is a readily available workaround that makes it easy for a module to access any symbol it wants. That workaround seems likely to be removed soon despite some possible inconvenience for some out-of-tree users; the reason why that is happening turns out to be relatively interesting.

The backdoor in question is kallsyms_lookup_name(), which will return the address associated with any symbol in the kernel's symbol table. Modular code that wants to access a symbol ordinarily denied to it can use kallsyms_lookup_name() to get the address of its target, then dereference it in the usual ways. This function itself is exported with the GPL-only restriction, which theoretically limits its use to free software. But if a proprietary module somewhere were to falsely claim a free license to get at GPL-only symbols, it would not be the first time.

Will Deacon has posted a patch series that removes the export for kallsyms_lookup_name() (and kallsyms_on_each_symbol(), which is also open to abuse). There were some immediate positive responses; few developers are favorably inclined toward module authors trying to get around the export system, after all. There were, however, a couple of concerns expressed.

One of those is that there is, it seems, a class of out-of-tree users of kallsyms_lookup_name() that is generally considered to be legitimate: live-patching systems for the kernel. Irritatingly, kernel bugs often stubbornly refuse to restrict themselves to exported functions, so a live patch must be able to locate (and patch out) any function in the kernel; kallsyms_lookup_name() is a convenient way to do that. After some discussion Joe Lawrence let it be known that the kpatch system has all of its needed infrastructure in the mainline kernel, and so has no further need for kallsyms_lookup_name(). The Ksplice system, though, evidently still uses it. As Miroslav Benes observed, though: "no one cares about ksplice in upstream now". So it would appear that live patching will not be an obstacle to the merging of this patch.

A different sort of concern was raised by Masami Hiramatsu, who noted that there are a number of other ways to find the address associated with a kernel symbol. User space could place some kprobes to extract that information, or a kernel module could, if time and CPU use is not a concern, use snprintf() with the "%pF" format (which prints the function associated with a given address) to search for the address of interest. He worried that the change would make life harder for casual developers while not really getting in the way of anybody who is determined to abuse the module mechanism.

In response, Deacon posted an interesting message about what is driving this particular change. Kernel developers are happy to make changes just to make life difficult for developers they see as abusing the system, but that is not quite what is happening here. Instead, it is addressing a support issue at Google.

Back in 2018, LWN reported on work being done to bring the Android kernel closer to the mainline. One of the steps in that direction is moving the kernel itself into the Android generic system image (GSI), an Android build that must boot and run on a device for that device to be considered compliant with the Android requirements. Putting the kernel into the GSI means that hardware vendors can no longer modify it; they will be limited to what they can do by adding kernel modules to the GSI.

Restricting vendors to supplying kernel modules greatly limits the kind of changes they can make; there will be no more Android devices that replace the CPU scheduler with some vendor's special version, for example. But that only holds if modules are restricted to the exported-symbol interface; if they start to reach into arbitrary parts of the kernel, all bets are off. Deacon doesn't say so, but it seems clear that some vendors are, at a minimum, thinking about doing exactly that. The business-friendly explanation for removing this capability is: "Monitoring and managing the ABI surface is not feasible if it effectively includes all data and functions via kallsyms_lookup_name()".

After seeing this explanation, Hiramatsu agreed that the patch makes sense and offered a Reviewed-by tag. So this concern, too, seems unlikely to prevent this patch set from being merged.

It's worth repeating that discouraging module developers from bypassing the export mechanism is generally seen as more than sufficient motivation to merge a change like this. But it is also interesting to see a large company supporting that kind of change as well. By more closely tying the Android kernel to the mainline, Google would appear to be better aligning its own interests with the long-term interests of the development community — on this point, at least. That, hopefully, will lead to better kernels on our devices that also happen to be a lot closer to mainline kernels.

Comments (14 posted)

Python time-zone handling

By Jake Edge
March 4, 2020

Handling time zones is a pretty messy affair overall, but language runtimes may have even bigger problems. As a recent discussion on the Python discussion forum shows, there are considerations beyond those that an operating system or distribution needs to handle. Adding support for the IANA time zone database to the Python standard library, which would allow using names like "America/Mazatlan" to designate time zones, is more complicated than one might think—especially for a language trying to support multiple platforms.

It may come as a surprise to some that Python has no support in the standard library for getting time-zone information from the IANA database (also known as the Olson database after its founder). The datetime module in the standard library has the idea of a "time zone" but populating an instance from the database is typically done using one of two modules from the Python Package Index (PyPI): pytz or dateutil. Paul Ganssle is the maintainer of dateutil and a contributor to datetime; he has put out a draft Python Enhancement Proposal (PEP) to add IANA database support as a new standard library module.

Ganssle gave a presentation at the 2019 Python Language Summit about the problem. On February 25, he posted a draft of PEP 615 ("Support for the IANA Time Zone Database in the Standard Library"). The original posted version of the PEP can be found in the PEPs GitHub repository. The datetime.tzinfo abstract base class provides ways "to implement arbitrarily complex time zone rules", but he has observed that users want to work with three time-zone types: fixed offsets from UTC, the system time zone, and IANA time zones. The standard library supports the first type with datetime.timezone objects, and the second to a certain extent, but does not support IANA time zones at all.

There are some wrinkles to handling time zones, starting with the fact that they change—frequently. The IANA database is updated multiple times per year; "between 1997 and 2020, there have been between 3 and 21 releases per year, often in response to changes in time zone rules with little to no notice". Linux and macOS have packages with that information which get updated as usual, but the situation for Windows is more complicated. Beyond that, there is a question of what should happen in a running program when the time-zone information changes out from under it.

The PEP proposes adding a top-level zoneinfo standard library module with a zoneinfo.ZoneInfo class for objects corresponding to a particular time zone. A call like:

    tz = zoneinfo.ZoneInfo("Australia/Brisbane")

will search for a corresponding Time Zone Information Format (TZif) file in various locations to populate the object. The zoneinfo.TZPATH list will be consulted to find the file of interest.

On Unix-like systems, that variable will be set to a list of the standard locations (e.g. /usr/share/zoneinfo, /etc/zoneinfo) where the time-zone data files are normally stored. On Windows, there is no official location for the system-wide time-zone information, so TZPATH will initially be empty. The PEP proposes that a data-only tzdata package be created for PyPI that would be maintained by the CPython core developers. That could be used on Windows systems to provide a source for the IANA database information.

By default, ZoneInfo objects would effectively be singletons; a cache would be maintained so that repeated uses of the same time-zone name would return the exact same object. That is not specifically being done for efficiency reasons, but to ensure that times in the same time zone will be handled correctly. The existing datetime arithmetic operations only consider time zones to be equal if they are the same object, not just if they contain the same information. But caching also protects running programs from strange behavior if the underlying time-zone data changes. Effectively, the data will be read once, on first use, and never change again until the interpreter is restarted.

There is support for loading time zones without consulting (or changing) the cache, as well as for clearing the cache, which would effectively reload the time zone for any new ZoneInfo object. But getting updates to time zones mid-stream is problematic in its own right, Ganssle said:

In the end, always getting “the latest data” is fraught with edge cases anyway, and the fact that datetime semantics rely on object identity rather than object equality just adds to the edge cases that are possible.

I will note that there is some precedent in this very area: local time information is only updated in response to a call to time.tzset(), and even that doesn’t work on Windows. The equivalent to calling time.tzset() to get updated time zone information would be calling ZoneInfo.clear_cache() to force ZoneInfo to use the updated data (or to always bypass the main constructor and use the .nocache() constructor).

But Florian Weimer was concerned that users would want those time-zone updates to automatically be incorporated, so he sees the caching behavior as problematic. "I do not think that users would want to restart their application (with a scheduled downtime) just to apply one of those updates." Ganssle acknowledged the concern, "but there are a lot of reasons to use the cache, and good reasons to believe that using the cache won’t be a problem". He went on to note that both pytz and dateutil already behave this way and he has heard no complaints. He also gave an example of surprising behavior without any caching:

>>> from datetime import *
>>> from zoneinfo import ZoneInfo
>>> dt0 = datetime(2020, 3, 8, tzinfo=ZoneInfo.nocache("America/New_York"))
>>> dt1 = dt0 + timedelta(1)
>>> dt2 = dt1.replace(tzinfo=ZoneInfo.nocache("America/New_York"))
>>> dt2 == dt1
True

Each call to ZoneInfo.nocache() will return a different object, even if the time-zone name is the same. So dt1 and dt2 have the same time-zone information, but different ZoneInfo objects. The two datetime objects compare "equal" (==) because they represent the same "wall time", but that does not mean that arithmetic operations will behave as one might expect:

>>> print(dt2 - dt1)
0:00:00
>>> print(dt2 - dt0)
23:00:00
>>> print(dt1 - dt0)
1 day, 0:00:00

March 8, 2020 is the day of the daylight savings time transition in the US, so adding one day (i.e. timedelta(1)) crosses that boundary. In a followup message, he explained more about the oddities of datetime math that are shown by the example:

This is because there’s an STD->DST transition between 2020-03-08 and 2020-03-09, so the difference in wall time is 24 hours, but the absolute elapsed time is 23 hours.

[...] So dt2 - dt0 is treated as two different zones and the math is done in UTC, whereas dt1 - dt0 is treated as the same zone, and the math is done in local time.

dt1 will necessarily be the same zone as dt0, because it’s the result of an arithmetical operation on dt0. dt2 is a different zone because I bypassed the cache, but if it hit the cache, the two would be the same.

Using the pickle object-serialization mechanism on ZoneInfo objects was also discussed. The PEP originally proposed that pickling a ZoneInfo object would serialize all of the information from the object (e.g. all of the current and historical transition dates), rather than simply serializing the key (e.g. "America/NewYork"). Only serializing the key could lead to problems when de-serializing the object with a different set of time-zone data (e.g. the "Asia/Qostanay" time zone was added in 2018).

But, as pytz maintainer Stuart Bishop pointed out, serializing all of the transition data is likely to lead to other, worse problems:

If I serialize ‘2022-06-05 14:00 Europe/Berlin’ today, and deserialize it in two years time after Berlin has ratified EU recommendations and abolished DST, then there are two possible results. If my application requires calendaring semantics, when deserializing I want to apply the current timezone definition, and my appointment at 2pm in Berlin is still at 2pm in Berlin. Because I need wallclock time (the time a clock hung on the wall in that location should show). If I wanted a fixed timestamp, best practice is to convert it to UTC to avoid all the potential traps, but it would also be ok to deserialize the time using the old, incorrect offset it was stored with and end up with 1pm wallclock time.

The PEP specifies that datetimes get serialized with all transition data. That seems unnecessary, as the transition data is reasonably likely to be wrong when it is de-serialized, and I can’t think of any use cases where you want to continue using the wrong data.

Ganssle agreed that it makes more sense to pickle ZoneInfo objects "by reference" (i.e. by time-zone name), though providing a way to also pickle "by value" for those who need or want it would be an option. Guido van Rossum had suggested an approach where a RawZoneInfo class would underlie ZoneInfo objects. Pickling a RawZoneInfo could be done by value. Ganssle liked that idea but thought that it could always be added later if there was a need for it; dateutil.tz already gives the by-value ability, so that could be used in the interim if needed.

Overall, the reaction to the PEP seems quite favorable. Bishop said that he looks forward to "being able to deprecate pytz, making it a thin wrapper around the standard library when run with a supported Python". Ganssle is still working out some of the details, particularly around whether to automatically install the tzdata module for platforms where there is no system-supplied IANA database. It seems likely that we will soon see support for IANA time zones in Python—presumably in Python 3.9 in October.

Comments (22 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>