Python time-zone handling
Handling time zones is a pretty messy affair overall, but language runtimes may have even bigger problems. As a recent discussion on the Python discussion forum shows, there are considerations beyond those that an operating system or distribution needs to handle. Adding support for the IANA time zone database to the Python standard library, which would allow using names like "America/Mazatlan" to designate time zones, is more complicated than one might think—especially for a language trying to support multiple platforms.
It may come as a surprise to some that Python has no support in the standard library for getting time-zone information from the IANA database (also known as the Olson database after its founder). The datetime module in the standard library has the idea of a "time zone" but populating an instance from the database is typically done using one of two modules from the Python Package Index (PyPI): pytz or dateutil. Paul Ganssle is the maintainer of dateutil and a contributor to datetime; he has put out a draft Python Enhancement Proposal (PEP) to add IANA database support as a new standard library module.
Ganssle gave a presentation
at the 2019
Python Language Summit about the problem. On February 25, he posted
a draft of PEP 615
("Support for the IANA Time Zone Database in the Standard
Library
"). The original posted version of the PEP can be found
in the PEPs GitHub repository.
The datetime.tzinfo
abstract base class provides ways "to implement arbitrarily
complex time zone rules
", but he has observed that users want to work with
three time-zone types: fixed offsets from UTC, the system time zone, and
IANA time zones. The standard library supports the first type with datetime.timezone
objects, and the second to a certain extent, but does not support IANA time
zones at all.
There are some wrinkles to handling time zones, starting with the
fact that they change—frequently. The IANA database is updated multiple
times per year; "between 1997 and 2020, there have been between 3 and 21
releases per year, often in response to changes
in time zone rules with
little to no notice
". Linux and macOS have packages with that
information which get updated as usual, but the situation for Windows is
more complicated. Beyond that, there is a question of what should happen
in a running program when the time-zone information changes out from under it.
The PEP proposes adding a top-level zoneinfo standard library module with a zoneinfo.ZoneInfo class for objects corresponding to a particular time zone. A call like:
tz = zoneinfo.ZoneInfo("Australia/Brisbane")
will search for a corresponding Time
Zone Information Format (TZif) file in various locations to populate
the object. The zoneinfo.TZPATH list will be consulted to find
the file of interest.
On Unix-like systems, that variable will be set to a list of the standard locations (e.g. /usr/share/zoneinfo, /etc/zoneinfo) where the time-zone data files are normally stored. On Windows, there is no official location for the system-wide time-zone information, so TZPATH will initially be empty. The PEP proposes that a data-only tzdata package be created for PyPI that would be maintained by the CPython core developers. That could be used on Windows systems to provide a source for the IANA database information.
By default, ZoneInfo objects would effectively be singletons; a cache would be maintained so that repeated uses of the same time-zone name would return the exact same object. That is not specifically being done for efficiency reasons, but to ensure that times in the same time zone will be handled correctly. The existing datetime arithmetic operations only consider time zones to be equal if they are the same object, not just if they contain the same information. But caching also protects running programs from strange behavior if the underlying time-zone data changes. Effectively, the data will be read once, on first use, and never change again until the interpreter is restarted.
There is support for loading time zones without consulting (or changing) the cache, as well as for clearing the cache, which would effectively reload the time zone for any new ZoneInfo object. But getting updates to time zones mid-stream is problematic in its own right, Ganssle said:
I will note that there is some precedent in this very area: local time information is only updated in response to a call to time.tzset(), and even that doesn’t work on Windows. The equivalent to calling time.tzset() to get updated time zone information would be calling ZoneInfo.clear_cache() to force ZoneInfo to use the updated data (or to always bypass the main constructor and use the .nocache() constructor).
But Florian Weimer was concerned
that users would want those time-zone updates to automatically be
incorporated, so he sees the caching behavior as problematic. "I do not
think that users would want to restart their application (with a scheduled
downtime) just to apply one of those updates.
" Ganssle acknowledged
the concern, "but there are a lot of reasons to use the cache, and
good reasons to believe that using the cache won’t be a problem
".
He went on to note that both pytz and dateutil already
behave this way and he has heard no complaints. He also gave an example of
surprising behavior without any caching:
>>> from datetime import *
>>> from zoneinfo import ZoneInfo
>>> dt0 = datetime(2020, 3, 8, tzinfo=ZoneInfo.nocache("America/New_York"))
>>> dt1 = dt0 + timedelta(1)
>>> dt2 = dt1.replace(tzinfo=ZoneInfo.nocache("America/New_York"))
>>> dt2 == dt1
True
Each call to ZoneInfo.nocache() will return a different object, even if the time-zone name is the same. So dt1 and dt2 have the same time-zone information, but different ZoneInfo objects. The two datetime objects compare "equal" (==) because they represent the same "wall time", but that does not mean that arithmetic operations will behave as one might expect:
>>> print(dt2 - dt1) 0:00:00 >>> print(dt2 - dt0) 23:00:00 >>> print(dt1 - dt0) 1 day, 0:00:00
March 8, 2020 is the day of the daylight savings time transition in the US, so adding one day (i.e. timedelta(1)) crosses that boundary. In a followup message, he explained more about the oddities of datetime math that are shown by the example:
[...] So dt2 - dt0 is treated as two different zones and the math is done in UTC, whereas dt1 - dt0 is treated as the same zone, and the math is done in local time.
dt1 will necessarily be the same zone as dt0, because it’s the result of an arithmetical operation on dt0. dt2 is a different zone because I bypassed the cache, but if it hit the cache, the two would be the same.
Using the pickle object-serialization mechanism on ZoneInfo objects was also discussed. The PEP originally proposed that pickling a ZoneInfo object would serialize all of the information from the object (e.g. all of the current and historical transition dates), rather than simply serializing the key (e.g. "America/NewYork"). Only serializing the key could lead to problems when de-serializing the object with a different set of time-zone data (e.g. the "Asia/Qostanay" time zone was added in 2018).
But, as pytz maintainer Stuart Bishop pointed out, serializing all of the transition data is likely to lead to other, worse problems:
The PEP specifies that datetimes get serialized with all transition data. That seems unnecessary, as the transition data is reasonably likely to be wrong when it is de-serialized, and I can’t think of any use cases where you want to continue using the wrong data.
Ganssle agreed that it makes more sense to pickle ZoneInfo objects "by reference" (i.e. by time-zone name), though providing a way to also pickle "by value" for those who need or want it would be an option. Guido van Rossum had suggested an approach where a RawZoneInfo class would underlie ZoneInfo objects. Pickling a RawZoneInfo could be done by value. Ganssle liked that idea but thought that it could always be added later if there was a need for it; dateutil.tz already gives the by-value ability, so that could be used in the interim if needed.
Overall, the reaction to the PEP seems quite favorable. Bishop said that
he looks forward to "being able to deprecate pytz, making it a thin
wrapper around the standard library when run with a supported
Python
". Ganssle is still working out some of the details,
particularly around whether to automatically install the tzdata
module for platforms where there is no system-supplied IANA database. It
seems likely that we will soon see support for IANA time zones in
Python—presumably in Python 3.9 in October.
| Index entries for this article | |
|---|---|
| Python | Python Enhancement Proposals (PEP)/PEP 615 |