Hi, it’s me, back again with another update to pg_clickhouse, the query
interface for ClickHouse from Postgres. This release, v0.1.10,
maintains binary compatibility with earlier versions but ships a number of
significant improvements that increase compatibility of Postgres features with
ClickHouse. Highlights include:
Mappings for the JSON and JSONB-> TEXT and ->> TEXT operators, as
well as jsonb_extract_path_text() and jsonb_extract_path(), to be pushed
down to ClickHouse using its sub-column syntax.
Mappings to push down the Postgres statement_timestamp(),
transaction_timestamp(), and clock_timestamp() functions, as well as
the Postgres “SQL Value Functions”, including CURRENT_TIMESTAMP,
CURRENT_USER, and CURRENT_DATABASE.
And the big one: mappings to push down compatible window functions,
including ROW_NUMBER, RANK, DENSE_RANK, LEAD,LAG, FIRST_VALUE,
LAST_VALUE, NTH_VALUE, NTILE, CUME_DIST, PERCENT_RANK, and
MIN/MAX OVER.
Oh yeah, the other big one: added result set streaming to the HTTP
driver. Rather that load all the results A testing loading a 1GB table
reduced memory consumption from over 1GB to 73MB peak.
We’ll work up a longer post to show off some of these features in the next
week. But in the meantime, git it while it’s hot!
We fixed a few bugs this week in pg_clickhouse, the query interface for
ClickHouse from Postgres. It features improved query cancellation and function
& operator pushdown, including to_timestamp(float8), ILIKE, LIKE, and
regex operators. Get the new v0.1.6 release from the usual places:
I’ve been busy with an internal project at work, but have responded to a few
pg_clickhouse reports for a couple crashes and vulnerabilities, thanks to
pen testing and a community security report. These changes drive the
release of v0.1.5 today.
Just a quick post to note the release of pg_clickhouse v0.1.4. This v0.1
maintenance release can be upgraded in-place and requires no
ALTER EXTENSION UPDATE command; as soon as sessions reload the shared
library they’ll be good to go.
Thanks in part to reports from attentive users, v0.1.4’s most significant
changes improve the following:
The binary driver now properly inserts NULL into a Nullable(T) column.
Previously it would raise an error.
The http driver now properly parses arrays. Previously it improperly
included single quotes in string items and would choke on brackets ([])
in values.
Both drivers now support mapping a ClickHouse String types to Postgres
BYTEA columns. Previously the worked only with text types, which is
generally preferred. But since ClickHouse explicitly supports binary data
in String values (notably hash function return values), pg_clickhouse
needs to support it, as well.
My thanks to pg_clickhouse users like Rahul Mehta for reporting issues, and
to my employer, ClickHouse, for championing this extension. Next up: more
aggregate function mapping, hash function pushdown, and improved subquery
(specifically, SubPlan) pushdown.
Today I released v1.7.0 of the pgxn-tools OCI image, which simplifies
Postgres extension testing and PGXN distribution. The new version includes
just a few updates and improvements:
Upgraded the Debian base image from Bookworm to Trixie
Set the PGUSER environment variable to postgres in the Dockerfile,
removing the need for users to remember to do it.
Updated pg-build-test to set MAKEFLAGS="-j $(nprocs)" to shorten
build runtimes.
Also updated pgrx-build-test to pass -j $(nprocs), for the same
reason.
Upgraded the pgrx test extension to v0.16.1 and test it on Postgres
versions 13-16.
Just a security and quality of coding life release. Ideally existing workflows
will continue to work as they always have.
Please welcome dmjwk into the world. This “demo JWK” (or “dumb JWK” if you
like) service provides super simple Identity Provider APIs strictly for demo
purposes.
Say you’ve written a service that depends on a public JSON Web Key (JWK) set
to authenticate JSON Web Tokens (JWT) submitted as OAuth 2 Bearer
Tokens. Your users will normally configure the service to use an
internal or well-known provider, such as Auth0, Okta, or AWS. Such
providers might be too heavyweight for demo purposes, however.
For my own use, I needed nothing more than a Docker Compose file with
local-only services. I also wanted some control over the contents of the
tokens, since my records the sub field from the JWT in an audit trail, and
something like 1a1077e6-3b87-1282-789c-f70e66dab825 (as in Vault JWTs)
makes for less-than-friendly text to describe in a demo.
I created dmjwk to scratch this itch. It provides a basic Resource Owner
Password Credentials Grant OAuth 2 flow to create custom JWTs, a well-known
URL for the public JWK set, and a simple API that validates JWTs. None of it
is real, it’s all for show, but the show’s the point.
Quick Start
The simplest way to start dmjwk is with its OCI image (there are binaries
for 40 platforms, as well). It starts on port 443, since hosts commonly
reserve that port, let’s map it to 4433 instead:
docker run -d -p 4433:443 --name dmjwk --volume .:/etc/dmjwk ghcr.io/theory/dmjwk
This command fires up dmjwk with a self-signed TLS certificate for localhost
and creates a root cert bundle, ca.pem, in the current directory. Use it
with your favorite HTTP client to make validated requests.
Configure services to use this URL,
https://localhost:4433/.well-known/jwks.json, to to validate JWTs created by
dmjwk.
Authorization
To fetch a JWT signed by the first key in the JWK set (just the one in this
example), make an application/x-www-form-urlencoded POST with the required
grant_type, username, and password fields:
dmjwk stores no actual usernames and passwords; it’s all for show. Provide any
username you like and Base64-encode the username, without trailing equal
signs, as the password.
{"error":"invalid_token","error_description":"token is malformed: token contains an invalid number of segments"}
That’s It
dmjwk includes a fair number of configuration options, including external
certificates, custom host naming (useful with Docker Compose), and multiple
key generation. If you find it useful for your demos (but not for production
— DON’T DO THAT) — let me know. And if not, that’s fine, too. This is
a bit of my pursuit of a thick desire, made mainly for me, but it pleases me
if others find it helpful too.
I wanted to optimize away parsing the key/value pairs from the
pg_clickhousepg_clickhouse.session_settings GUC for every query by
pre-parsing it on assignment and assigning it to a separate variable. It
took a few tries, as the GUC API requires quite specific memory allocation
for extra data to work properly. It took me a few tries to land on a
workable and correct solution.
Struggling to understand, making missteps, and ultimately coming to a
reasonable design and solution satisfies me so immensely that I always want to
share. This piece gets down in the C coding weeds; my fellow extension coders
might enjoy it.
The ClickHouse blog has a posted a piece by yours truly introducing
pg_clickhouse, a PostgreSQL extension to run ClickHouse queries from
PostgreSQL:
While clickhouse_fdw and its predecessor, postgres_fdw, provided the
foundation for our FDW, we set out to modernize the code & build process, to
fix bugs & address shortcomings, and to engineer into a complete product
featuring near universal pushdown for analytics queries and aggregations.
Such advances include:
Adopting standard PGXS build pipeline for PostgreSQL extensions
Adding prepared INSERT support to and adopting the latest supported
I’ve spent most of the last couple months working on this project, learning a
ton about ClickHouse, foreign data wrappers, C and C++, and query
pushdown. Interested? Try ou the Docker image:
Out today: Sqitch v1.6.0. This release adds a brand new engine:
ClickHouse. I started a new job at ClickHouse on September 2, and my first
task, as a way to get to know the database, was to add it to Sqitch.
Fortuitously, ClickHouse added support for updates and deletes, which Sqitch
requires, in the August release. Sqitch v1.6.0 therefore supports ClickHouse
25.8 or later.
A quick note for PostgreSQL extension maintainers: PostgreSQL 18 introduces a
new macro: PG_MODULE_MAGIC_EXT. Use it to name and version your modules.
Where your module .c file likely has:
Replace the name of your module and the version as appropriate. Note that
PG_MODULE_MAGIC was added in Postgres 8.2; if for some reason your module
still supports earlier versions, use a nested #ifdef to conditionally
execute it:
If you manage the module version in your Makefile, as the PGXN Howto
suggests, consider renaming the .c file to .c.in and changing the
Makefile like so:
Replace .version = "1.2.3" with .version = "__VERSION__"
Add src/$(EXTENSION).c to EXTRA_CLEAN
Add this make target:
src/$(EXTENSION).c: src/$(EXTENSION).c.in
sed -e 's,__VERSION__,$(EXTVERSION),g' $< > $@
The rest of the talk encompasses newer work. Read on for details.
Automated Packaging Challenges
Back in December I took over maintenance of the Trunk registry,
a.k.a., pgt.dev, refactoring and upgrading all 200+ extensions and adding
Postgres 17 builds. This experience opened my eyes to the wide variety of
extension build patterns and configurations, even when supporting a single OS
(Ubuntu 22.04 “Jammy”). Some examples:
pglogical requires an extra make param to build on PostgreSQL 17:
make -C LDFLAGS_EX="-L/usr/lib/postgresql/17/lib"
Some pgrx extensions require additional params, for example:
vectorscale requires the environment variable
RUSTFLAGS="-C target-feature=+avx2,+fma"
pljava needs a pointer to libjvm:
mvn clean install -Dpljava.libjvmdefault=/usr/lib/x86_64-linux-gnu/libjvm.so
plrust needs files to be moved around, a shell script to be run, and to
be built from a subdirectory
bson also needs files to be moved around and a pointer to libbson
timescale requires an environment variable and shell script to run
before building
Many extensions require patching to build for various configurations and
OSes, like this tweak to build pguri on Postgres 17 and this patch
to get duckdb_fdw to build at all
Doubtless there’s much more. These sorts of challenges led the RPM and APT
packaging systems to support explicit scripting and patches for every package.
I don’t think it would be sensible to support build scripting in the meta
spec.
However, the PGXN meta SDK I developed last year supports the merging of
multiple META.json files, so that downstream packagers could maintain files
with additional configurations, including explicit build steps or lists of
packages, to support these use cases.
Furthermore, the plan to add reporting to PGXN v2 means that downstream
packages could report build failures, which would appear on PGXN, where they’d
encourage some maintainers, at least, to fix issues within their control.
Dependency Resolution
Dependencies present another challenge. The v2 spec supports third
party dependencies — those not part of Postgres itself or the ecosystem of
extensions. Ideally, an extension like pguri would define its dependence on
the uriparser library like so:
An intelligent build client will parse the dependencies, provided as purls,
to determine the appropriate OS packages to install to satisfy. For example,
building on a Debian-based system, it would know to install liburiparser-dev
to build the extension and require liburiparser1 to run it.
With the aim to support multiple OSes and versions — not to mention Postgres
versions — the proposed PGXN binary registry would experience quite the
combinatorial explosion to support all possible dependencies on all possible
OSes and versions. While I propose to start simple (Linux and macOS, Postgres
14-18) and gradually grow, it could quickly get quite cumbersome.
So much so that I can practically hear Christoph’s and Devrim’s reactions
from here:
Photo of Christoph, Devrim, and other long-time packagers laughing at me.
Or perhaps:
Photo of Christoph and Devrim laughing at me.
I hardly blame them.
A CloudNativePG Side Quest
Gabriele Bartolini blogged the proposal to deploy extensions to
CloudNativePG containers without violating the immutability of the
container. The introduction of the extension_control_path GUC in Postgres
18 and the ImageVolume feature in Kubernetes 1.33 enable the pattern, likely
to be introduced in CloudNativePG v1.27. Here’s a sample CloudNativePG cluster
manifest with the proposed extension configuration:
The extensions object at lines 9-12 configures pgvector simply by
referencing an OCI image that contains nothing but the files for the
extension. To “install” the extension, the proposed patch triggers a rolling
update, replicas first. For each instance, it takes the following steps:
Mounts each extension as a read-only ImageVolume under /extensions; in
this example, /extensions/vector provides the complete contents of the
image
Updates LD_LIBRARY_PATH to include the path to the lib directory of
the each extension, e.g., /extensions/vector/lib.
Updates the extension_control_path and dynamic_library_path GUCs to
point to the share and lib directories of each extension, in this
example:
In other words, every additional extension requires another prefix to be
appended to each of these configurations. Ideally we could use a single prefix
for all extensions, avoiding the need to update these configs and therefore to
restart Postgres. Setting aside the ImageVolume limitation2 for the
moment, this pattern would require no rolling restarts and no GUC updates
unless a newly-added extension requires pre-loading via
shared_preload_libraries.
Getting there, however, requires a different extension file layout than
PostgreSQL currently uses.
RFC: Extension Packaging and Lookup
Imagine this:
A single extension search path GUC
Each extension in its own eponymous directory
Pre-defined subdirectory names used inside each extension directory
For this pattern, Postgres would look for the appropriately-named
directory with a control file in each of the paths. To find the semver
extension, for example, it would find /extensions/semver/semver.control.
All the other files for the extension would live in specifically-named
subdirectories: doc for documentation files, lib for shared libraries,
sql for SQL deployment files, plus bin, man, html, include,
locale, and any other likely resources.
With all of the files required for an extension bundled into well-defined
subdirectories of a single directory, it lends itself to the layout of the
proposed binary distribution format. Couple it with OCI
distribution and it becomes a natural fit for ImageVolume deployment:
simply map each extension OCI image to a subdirectory of the desired search
path and you’re done. The extensions object in the CNPG Cluster manifest
remains unchanged, and CNPG no longer needs to manipulate any GUCs.
Some might recognize this proposal from a previous RFC post. It not only
simplifies the CloudNativePG use cases, but because it houses all of the files
for an extension in a single bundle, it also vastly simplifies installation
on any system:
Download the extension package
Validate its signature & contents
Unpack its contents into a directory named for the extension in the
extension search path
Simple!
Fun With Dependencies
Many extensions depend on external libraries, and rely on the OS to find them.
OS packagers follow the dependency patterns of their packaging systems:
require the installation of other packages to satisfy the dependencies.
How could a pattern be generalized by the Trunk Packaging Format to work on
all OSes? I see two potential approaches:
List the dependencies as purls that the installing client translates to
the appropriate OS packages it installs.
Bundle dependencies in the Trunk package itself
Option 1 will work well for most use cases, but not immutable systems like
CloudNativePG. Option 2 could work for such situations. But perhaps you
noticed the omission of LD_LIBRARY_PATH manipulation in the packaging and
lookup discussion above. Setting aside the multitude of reasons to avoid
LD_LIBRARY_PATH3, how else could the OS find shared libraries needed by
an extension?
Typically, one installs shared libraries in one of a few directories known to
tools like ldconfig, which must run after each install to cache their
locations. But one cannot rely on ldconfig in immutable environments,
because the cache of course cannot be mutated.
We could, potentially, rely on rpath, a feature of modern dynamic linkers
that reads a list of known paths from the header of a binary file. In fact,
most modern OSes support$ORIGIN as an rpath value4 (or
@loader_path on Darwin/macOS), which refers to the same directory in which
the binary file appears. Imagine this pattern:
The Trunk package for an extension includes dependency libraries alongside
the extension module
The module is compiled with rpath=$ORIGIN
To test this pattern, let’s install the Postgres 18 beta and try the pattern
with the pguri extension. First, remove the $libdir/ prefix (as discussed
previously) and patch the extension for Postgres 17+:
The second line of output shows that it does in fact find liburiparser.so.1
where we put it. So far so good. Just need to tell the GUCs where to find them
and restart Postgres:
Now use ldd to see what shared libraries it needs:
❯ ldd /usr/local/postgresql/lib/http.so
linux-vdso.so.1
libcurl.so.4 => not found
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6
Naturally it needs libcurl; let’s copy it from another system and try again:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
❯ scp dev:libcurl.so.4 /usr/local/postgresql/lib/
❯ ldd /usr/local/postgresql/lib/http.so
linux-vdso.so.1
libcurl.so.4 => /usr/local/postgresql/lib/libcurl.so.4
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6
/lib/ld-linux-aarch64.so.1
libnghttp2.so.14 => not found
libidn2.so.0 => /lib/aarch64-linux-gnu/libidn2.so.0
librtmp.so.1 => not found
libssh.so.4 => not found
libpsl.so.5 => not found
libssl.so.3 => /lib/aarch64-linux-gnu/libssl.so.3
libcrypto.so.3 => /lib/aarch64-linux-gnu/libcrypto.so.3
libgssapi_krb5.so.2 => /lib/aarch64-linux-gnu/libgssapi_krb5.so.2
libldap.so.2 => not found
liblber.so.2 => not found
libzstd.so.1 => /lib/aarch64-linux-gnu/libzstd.so.1
libbrotlidec.so.1 => not found
libz.so.1 => /lib/aarch64-linux-gnu/libz.so.1
Line 4 shows it found libcurl.so.4 where we put it, but the rest of the
output lists a bunch of new dependencies that need to be satisfied. These did
not appear before because the http.so module doesn’t depend on them; the
libcurl.so library does. Let’s add libnghttp2 and try again:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
❯ scp dev:libnghttp2.so.14 /usr/local/postgresql/lib/
❯ ldd /usr/local/postgresql/lib/http.so
linux-vdso.so.1
libcurl.so.4 => /usr/local/postgresql/lib/libcurl.so.4
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6
/lib/ld-linux-aarch64.so.1
libnghttp2.so.14 => not found
libidn2.so.0 => /lib/aarch64-linux-gnu/libidn2.so.0
librtmp.so.1 => not found
libssh.so.4 => not found
libpsl.so.5 => not found
libssl.so.3 => /lib/aarch64-linux-gnu/libssl.so.3
libcrypto.so.3 => /lib/aarch64-linux-gnu/libcrypto.so.3
libgssapi_krb5.so.2 => /lib/aarch64-linux-gnu/libgssapi_krb5.so.2
libldap.so.2 => not found
liblber.so.2 => not found
libzstd.so.1 => /lib/aarch64-linux-gnu/libzstd.so.1
libbrotlidec.so.1 => not found
libz.so.1 => /lib/aarch64-linux-gnu/libz.so.1
Sadly, as line 7 shows, it still can’t find libnghttp2.so.
It turns out that rpath works only for immediate dependencies. To solve this
problem, liburl and all other shared libraries must also be compiled with
rpath=$ORIGIN — which means we can’t simply copy those libraries from OS
packages5. In th meantime, only deirect dependencies could be
bundled with an extension.
Project Status
The vision of accessible, easy-install extensions everywhere remains intact.
I’m close to completing a first release of the PGXN v2 build SDK with
support for meta spec v1 and v2, PGXS, and pgrx extensions. I expect the
first deliverable to be a command-line client to complement and eventuallly
replace the original CLI. It will be put to work building all the extensions
currently distributed on PGXN, which will surface new issues and patterns
that inform the development and completion of the v2 meta spec.
In the future, I’d also like to:
Finish working out Trunk format and dependency patterns
Develop and submit the prroposed extension_search_path patch
Submit ImageVolume feedback to Kubernetes to allow runtime mounting
Start building and distributing OCI Trunk packages
Make the pattern available for distributed registries, so anyone can build
their own Trunk releases!
Hack fully-dynamic extension loading into CloudNativePG
Let’s Talk
I recognize the ambition here, but feel equal to it. Perhaps not every bit
will work out, but I firmly believe in setting a clear vision and executing
toward it while pragmatically revisiting and revising it as experience
warrants.
If you’d like to contribute to the project or employ me to continue working on
it, let’s talk! Hit me up via one of the services listed on the about page.
The feature does not yet support pre-loading shared libraries.
Presumably a flag will be introduced to add the extension to
shared_preload_libraries. ↩︎
Though we should certainly request the ability to add new
ImageVolume mounts without a restart. We can’t be the only ones thinking
about kind of feature, right? ↩︎
In general, one should avoid LD_LIBRARY_PATH for variety of
reasons, not least of which its bluntness. For various security reasons,
macOS ignores it unless sip is disabled, and SELinux prevents its
propagation to new processes. ↩︎
I last wrote about auto-releasing PostgreSQL extensions on PGXN back in
2020, but I thought it worthwhile, following my Postgres
Extensions Day talk last week, to return again to the basics. With the
goal to get as many extensions distributed on PGXN as possible, this post
provides step-by-step instructions to help the author of any extension or
Postgres utility to quickly and easily publish every release.
PGXN aims to become the defacto source for all open-source PostgreSQL
extensions and tools, in order to help users quickly find and learn how to use
extensions to meet their needs. Currently, PGXN distributes source releases
for around 400 extensions (stats on the about page), a fraction of the ca.
1200 known extensions. Anyone looking for an extension might exist to solve
some problem must rely on search engines to find potential solutions between
PGXN, GitHub, GitLab, blogs, social media posts, and more. Without a single
trusted source for extensions, and with the proliferation of AI Slop in
search engine results, finding extensions aside from a few well-known
solutions proves a challenge.
By publishing releases and full documentation — all fully indexed by its
search index — PGXN aims to be that trusted source. Extension authors
provide all the documentation, which PGXN formats for legibility and linking.
See, for example, the pgvector docs.
If you want to make it easier for users to find your extensions, to read your
documentation — not to mention provide sources for binary packaging systems
— publish every release on PGXN.
Here’s how.
Create an Account
Step one: create a PGXN Manager account. The Email, Nickname, and Why
fields are required. The form asks “why” as a simple filter for bad actors.
Write a sentence describing what you’d like to release — ideally with a link
to the source repository — and submit. We’ll get the account approved
forthwith, which will send a confirmation email to your address. Follow the
link in the email and you’ll be good to go.
Anatomy of a Distribution
A PostgreSQL extension source tree generally looks something like this (taken
from the pair repository):
Extension authors will recognize the standard PGXS (or pgrx) source
distribution files; only META.json file needs explaining. The META.json
file is, frankly, the only file that PGXN requires in a release. It contains
the metadata to describe the release, following the PGXN Meta Spec.
This example contains only the required fields:
{"name":"pair","version":"0.1.0","abstract":"A key/value pair data type","maintainer":"David E. Wheeler <david@justatheory.com>","license":"postgresql","provides":{"pair":{"file":"sql/pair.sql","version":"0.1.0"}},"meta-spec":{"version":"1.0.0"}}
Presumably these fields contain no surprises, but a couple of details:
It starts with the name of the distribution, pair, and the release
version, 0.1.0.
The abstract provides a brief description of the extension, while the
maintainer contains contact information.
The license stipulates the distribution license, of course, usually one
of a few known, but may be customized.
The provides object lists the extensions or tools provided, each named
by an object key that points to details about the extension, including
main file, version, and potentially an abstract and documentation file.
The meta-spec object identifies the meta spec version used for the
META.json itself.
Release It!
This file with these fields is all you need to make a release. Assuming Git,
package up the extension source files like so (replacing your extension name
and version as appropriate).
git archive --format zip --prefix=pair-0.1.0 -o pair-0.1.0.zip HEAD
Then navigate to the release page, authenticate, and upload the resulting
.zip file.
And that’s it! Your release will appear on pgxn.org and on Mastodon
within five minutes.
Let’s Automate it!
All those steps would be a pain in the ass to follow for every release. Let’s
automate it using pgxn-tools! This OCI image contains the tools necessary to
package and upload an extension release to PGXN. Ideally, use a CI/CD
pipeline like a GitHub Workflow to publish a release on every version tag.
Set up Secrets
pgxn-tools uses your PGXN credentials to publish releases. To keep them
safe, use the secrets feature of your preferred CI/CD tool. This figure shows
the “Secrets and variables” configuration for a GitHub repository, with two
repository secrets: PGXN_USERNAME and PGXN_PASSWORD:
Create a Pipeline
Use those secrets and pgxn-tools in CI/CD pipeline. Here, for example, is a
minimal GitHub workflow to publish a release for every SemVer tag:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
on:push:tags:['v[0-9]+.[0-9]+.[0-9]+']jobs:release:name:Release on PGXNruns-on:ubuntu-latestcontainer:pgxn/pgxn-toolsenv:PGXN_USERNAME:${{ secrets.PGXN_USERNAME }}PGXN_PASSWORD:${{ secrets.PGXN_PASSWORD }}steps:- name:Check out the repouses:actions/checkout@v4- name:Bundle the Releaserun:pgxn-bundle- name:Release on PGXNrun:pgxn-release
Details:
Line 3 configures the workflow to run on a SemVer tag push, typically
used to denote a release.
Line 8 configures the workflow job to run inside a pgxn-tools container.
Lines 10-11 set environment variables with the credentials from the
secrets.
Line 16 bundles the release using either git archive or zip.
Line 18 publishes the release on PGXN.
Now publishing a new release is as simple as pushing a SemVer tag, like so:
git tag v0.1.0 -sm 'Tag v0.1.0'git push --follow-tags
That’s it! The workflow will automatically publish the extension for every
release, ensuring the latest and greatest always make it to PGXN where users
and packagers will find them.
The pgxn-tools image also provides tools to easily test a PGXS or pgrx
extension on supported PostgreSQL versions (going back as far as 8.2), also
super useful in a CI/CD pipeline. See Test Postgres Extensions With GitHub
Actions for instructions. Depending on your CI/CD tool of choice, you might
take additional steps, such as publishing a release on GitHub, as previously
described.
Optimizing for PGXN
But let’s dig deeper into how to optimize extensions for maximum
discoverability and user visibility on PGXN.
Add More Metadata
The META.json file supports many more fields that PGXN indexes and
references. These improve the chances users will find what they’re looking
for. This detailed example demonstrates how a PostGISMETA.json file might
start to provide additional metadata:
{"name":"postgis","abstract":"Geographic Information Systems Extensions to PostgreSQL","description":"This distribution contains a module which implements GIS simple features, ties the features to R-tree indexing, and provides many spatial functions for accessing and analyzing geographic data.","version":"3.5.0","maintainer":["Paul Ramsey <pramsey@example.com>","Sandro Santilli <sandro@examle.net>"],"license":["gpl_2","gpl_3"],"provides":{"postgis":{"abstract":"PostGIS geography spatial types and functions","file":"extensions/postgis/postgis.control","docfile":"extensions/postgis/doc/postgis.md","version":"3.5.0"},"address_standardizer":{"abstract":"Used to parse an address into constituent elements. Generally used to support geocoding address normalization step.","file":"extensions/address_standardizer/address_standardizer.control","docfile":"extensions/address_standardizer/README.address_standardizer","version":"3.5.0"}},"prereqs":{"runtime":{"requires":{"PostgreSQL":"12.0.0","plpgsql":0}},"test":{"recommends":{"pgTAP":0}}},"resources":{"bugtracker":{"web":"https://trac.osgeo.org/postgis/"},"repository":{"url":"https://git.osgeo.org/gitea/postgis/postgis.git","web":"https://git.osgeo.org/gitea/postgis/postgis","type":"git"}},"generated_by":"David E. Wheeler","meta-spec":{"version":"1.0.0","url":"https://pgxn.org/meta/spec.txt"},"tags":["gis","spatial","geometry","raster","geography","location"]}
Line 4 contains a longer description of the distribution.
Lines 6-9 show how to list multiple maintainers as an array.
Line 10 demonstrates support for an array of licenses.
Lines 11-24 list multiple extensions included in the distribution, with
abstracts and documentation files for each.
Lines 25-37 identify dependencies for various phases of the distribution
lifecycle, including configure, build, test, runtime, and develop. Each
contains an object identifying PostgreSQL or extension dependencies.
Lines 38-47 lists resources for the distribution, including issue
tracking and source code repository.
Lines 53-60 contains an array of tags, an arbitrary list of keywords for a
distribution used both in the search index and the PGXN tag cloud.
Admittedly the PGXN Meta Spec provides a great deal of information.
Perhaps the simplest way to manage it is to copy an existing META.json from
another project (or above) and edit it. In general, only the version fields
require updating for each release.
Write Killer Docs
The most successful extensions provide ample descriptive and reference
documentation, as well as examples. Most extensions feature a README, of
course, which contains basic information, build and install instructions, and
contact info. But as the pair tree, illustrates,
PGXN also supports extension-specific documentation in a variety of formats,
including:
PGXN will also index and format additional documentation files in any of the
above formats. See, for example, all the files formatted for orafce.
Exclude Files from Release
Use gitattributes to exclude files from the release. For example,
distributions don’t generally include .gitignore or the contents of the
.github directory. Exclude them from the archive created by git archive by
assigning export-ignore to each path to exclude in the .gitattributes
file, like so:
PGXN aims to be the trusted system of record for open-source PostgreSQL
extensions. Of course that requires that it contain all (or nearly all) of
said extensions. Hence this post.
Please help make it so by adding your extensions, both to help users find the
extensions they need, and to improve the discoverability of your extensions.
Over time, we aim to feed downstream extension distribution systems, such as
Yum, APT, CloudNativePG, OCI, and more.
Let’s make extensions available everywhere to everyone.
The final PostgresSQL Extension Mini-Summit took place on May 7. Gabriele
Bartolini gave an overview of PostgreSQL extension management in
CloudNativePG (CNPG). This talk brings together the topics of several
previous Mini-Summits — notably Peter Eisentraut on implementing an
extension search path — to look at the limitations of extension
support in CloudNativePG and the possibilities enabled by the extension search
path feature and the Kubernetes 1.33 ImageVolume feature. Check it out:
Or read on for the full transcript with thanks to Floor Drees for putting it
together.
Introduction
Floor Drees.
On May 7 we hosted the last of five (5) virtual Mini-Summits that lead up to
the big one at the Postgres Development Conference (PGConf.Dev), taking place
next week, in Montreal, Canada. Gabriele Bartolini, CloudNativePG
maintainer, PostgreSQL Contributor, and VP Cloud Native at EDB, joined to
talk about improving the Postgres extensions experience in Kubernetes with
CloudNativePG.
Floor Drees, Principal Program Manager at EDB, PostgreSQL CoCC member,
PGDay Lowlands organizer
The stream and the closed captions available for the recording are supported
by PGConf.dev and their gold level sponsors, Google, AWS, Huawei, Microsoft,
and EDB.
Improving the Postgres extensions experience in Kubernetes with CloudNativePG
Gabriele Bartolini.
Hi everyone. Thanks for this opportunity, and thank you Floor and David for
inviting me today.
I normally start every presentation with a question, and this is actually the
question that has been hitting me and the other maintainers of CloudNativePG
— and some are in this call — from the first day. We know that extensions
are important in Kubernetes, in Postgres, and we’ve always been asking how can
we deploy extensions, without breaking the immutability of the container.
So today I will be telling basically our story, and hopefully providing good
insights in the future about how with CloudNativePG we are trying to improve
the experience of Postgres extensions when running databases, including
issues.
I’ve been using Postgres for 25 years. I’m one of the co-founders of
2ndQuadrant, which was bought by a EDB in 2020. And because of my
contributions, I’ve been recognized as a Postgres contributor and I’m really
grateful for that. And I’m also “Data on Kubernetes ambassador”; my role is to
promote the usage of stateful workloads in Kubernetes. I’m also DevOps
evangelist. I always say this: DevOps is the reason why I encountered
Kubernetes, and it will also be the reason why I move away one day from
Kubernetes. It’s about culture and I’ll explain this later.
In the past I’ve been working with Barman; I’m one of the creators of
Barman. And since 2022, I’m one of the maintainers of CloudNativePG. I want
to thank my company, EDB, for being the major contributor in Postgres
history in terms of source code. And right now we are also the creators of
CloudNativePG. And as we’ll see, the company donated the IP to the CNCF. So
it’s something that is quite rare, and I’m really grateful for that.
What I plan to cover tonight is first, set the context and talk about
immutable application containers, which have been kind of a dogma for us from
day one. Then, how we are handling right now extensions in Kubernetes with
CNPG. This is quite similar to the way other operators deal with it. Then the
future and key takeaways.
First, we’re talking about Kubernetes. If you’re not familiar, it’s an
orchestration system for containers. It’s not just an executor of containers,
but it’s a complex system that also manages infrastructure. When it manages
infrastructure, it also manages cloud native applications that are also called
workloads. When we’re thinking about Postgres in Kubernetes, the database is a
workload like the others. That, I think, is the most important mind shift
among Postres users that I have faced myself, that I’ve always treated
Postgres differently from the rest. Here in Kubernetes is it’s just another
workload.
Then of course, it’s not like any other workload, and that’s where operators
come into play, and I think the work that we are doing even tonight is in the
direction to improve how databases is run in Kubernetes in general, and for
everyone.
It was open sourced in 2014, and, it’s owned by the CNCF, and it’s actually
the first project that graduated, and graduated is the most advanced stage in
the graduation process of the CNCF, which starts with sandbox, then incubation
and then graduation.
CloudNativePG is an operator for Postgres. It’s production-ready — what we
say is level five. Level five is kind of an utopic, and unbounded level, the
highest one as defined by the operator development framework. It’s used by all
these players including Tembo, IBM Cloud Paks, Google Cloud, Azure, Akamai,
and so on. CNPG is a CNCF project since January. It’s distributed under Apache
License 2.0 and the IP — the Intellectual Property — is owned by the
community and protected by the CNCF. It therefore is a vendor neutral and
openly governed project. This is kind of a guarantee that it will always be
free. This is also, in my opinion, a differentiation between CloudNativePG and
the rest.
The project was originally created by EDB, but specifically at that time, by
2ndQuadrant. And, as I always like to recall, it was Simon Riggs that put me
in charge of the initiative. I’ll always be grateful to Simon, not only for
that, but for everything he has done for me and the team.
CNPG can be installed in several ways. As you can see, it’s very popular in
terms of stars. There’s more than 4,000 commits. And what’s impressive is the
number of downloads in three years, which is 78 million, which means that it’s
used the way we wanted it to be used: with CICD pipelines.
This is the CNCF landscape; these are the CNCF projects. As you can see,
there are only five projects in the CNCF in the database area, and
CloudNativePG is the only one for Postgres. Our aim for 2025 and 2026 is to
become incubating. If you’re using CNPG and you want to help with the process,
get in touch with me and Floor.
I think to understand again, what, why we’ve done all this process, that led
to the patch that, you’ve seen in Postgres 18, it’s important to understand
what cloud native has meant to us since we started in 2019. We’ve got our own
definition, but I think it still applies. For us it’s three things, Cloud
native. It’s people that work following DevOps culture. For example, there are
some capabilities that come from DevOps that apply to the cloud native world.
I selected some of them like in user infrastructure, infrastructure
abstraction, version control. These three form the infrastructure-as-code
principle, together with the declarative configuration.
A shift left on security. You’ll see with CloudNativePG, we rarely mention
security because it’s pretty much everywhere. It’s part of the process. Then
continuous delivery.
The second item is immutable application containers, which kind of led the
immutable way of thinking about extensions. And then the third one is that
these application containers must be orchestrated via an
infrastructure-as-code by an orchestrator, and the standard right now is
Kubernetes.
For us it’s these three things, and without any of them, you cannot achieve
cloud native.
So what are these immutable application containers? To explain immutability
I’d like to talk about immutable infrastructure, which is probably what the
majority of people that have historically worked with Postgres are used to.
I’m primarily referring to traditional environments like VMs and bare metal
where the main ways we deploy Postgres is through packages, maybe even managed
by configuration managers, but still, packages are the main artifacts. The
infrastructure is seen as a long-term kind of project. Changes happen over
time and are incremental updates, updates on an existing infrastructure. So if
you want to know the history of the infrastructure over time, you need to
check all the changes that have applied. In case of failure of a system,
systems are healed. So that’s the pets concept that comes from DevOps.
On the other hand, immutable infrastructure relies on OCI container images.
OCI is a standard, the Open Container Initiative and it’s part of the
Linux Foundation as well. Immutable infrastructure is founded on continuous
delivery, which is the foundation of GitOps practices. In an immutable
infrastructure, releasing a new version of an application is not updating the
system’s application, it is building a new image and publishing it on a public
registry and then deploying it. Changes in the system happen in an atomic way:
the new version of a container is pulled from the registry and the existing
image is almost instantaneously replaced by the new one. This is true for
stateless applications and we’ll see, in the case of stateful applications
like Postgres, is not that instantaneous because we need to perform a
switchover or restart — in any case, generate a downtime.
When it comes to Kubernetes, the choice was kind of obvious to go towards that
immutable infrastructure. So no incremental updates, and in the case of
stateful workloads where you cannot change the content of the container, you
can use data volumes or persistent volumes. These containers are not changed.
If you want to change even a single file or a binary in a container image, you
need to create a new one. This is very important for security and change
management policies in general.
But what I really like about this way of managing our infrastructure is that,
at any time, Kubernetes knows exactly what software is running in your
infrastructure. All of this is versioned in an SCM, like Git or whatever.
This is something that in the mutable world is less easy to obtain. Again, for
security, this is the foundational thing because this is how you can control
CVEs, the vulnerabilities in your system. This is a very basic
representation of how you build, contain — let’s say the lifecycle of a
container image. You create a Dockerfile, you put it in Git, for example,
then there’s an action or a pipeline that creates the container image, maybe
even run some tests and then pushes it to the container registry.
I walked you through the concepts of mutable and immutable containers, what
are, these immutable application containers? If you go back and read what we
were rising before CloudNativePG was famous or was even used, we were always
putting in immutable application containers as one of the principles we could
not lose.
For an immutable application container, it means that there’s only a single
application running; that’s why it’s called “application”. If you have been
using Docker, you are more familiar with system containers: you run a Debian
system, you just connect and then you start treating it like a VM. Application
containers are not like that. And then they are immutable — read-only — so
you cannot even make any change or perform updates of packages. But in
CloudNativePG, because we are managing databases, we need to put the database
files in separate persistent volumes. Persistent volumes are standard
resources provided by Kubernetes. This is where we put PGDATA and, if you
want, a separate volume for WAL files with different storage specifications
and even an optional number of table spaces.
CloudNativePG orchestrates what we call “operand images”. These are very
important to understand. They contain the Postgres binaries and they’re
orchestrated via what we call the “instance manager”. The instance manager is
just the process that runs and controlled Postgres; I’ss the PID 1 — or the
entry point — of the container.
There’s no other, like SSHD or other, other applications work. There’s just
the instance manager that then controls everything else. And this is the
project of the operating images. This is one open source project, and every
week we rebuild the Postgres containers. We recently made some changes to the
flavors of these images and I’ll talk about it shortly.
We mentioned the database, we mentioned the binaries, but what about
extensions? This is the problem. Postgres extensions in Kubernetes with
CloudNativePG is the next section, and it’s kind of a drama. I’m not hiding
this. The way we are managing extensions in Kubernetes right now, in my
opinion, is not enough. It works, but it’s got several limitations — mostly
limitations in terms of usage.
For example, we cannot place them in the data files or in persistent volumes
because these volumes are not read-only in any way. In any case, they cannot
be strictly immutable. So we discarded this option to have persistent volume
where you could kind of deploy extensions and maybe you can even download on
the fly or use the package manager to download them or these kind of
operations. We discarded this from the start and we embraced the operand image
solution. Essentially what we did was placing these extensions in the same
operand image that contains the Postgres binaries. This is a typical approach
of also the other operators. If you think about also Zalando we call it “the
Spilo way”. Spilo contained all the software that would run with the Zalando
operator.
Our approach was a bit different, in that we wanted lighter images, so we
created a few flavors of images, and also selected some extensions that we
placed in the images. But in general, we recommended to build custom images.
We provided instructions and we’ve also provided the requirements to build
container images. But as you can see, the complexity of the operational layer
is quite high, it’s not reasonable to ask any user or any customer to build
their own images.
This is how they look now, although this is changing as I was saying:
You’ve got a base image, for example, the Debian base image. You deploy the
Postgres binaries. Then — even right now though it’s changing —
CloudNativePG requires Barman Cloud to be installed. And then we install the
extensions that we think are needed. For example, I think we distribute
pgAudit, if I recall correctly, pgvector and pg_failover_slots. Every
layer you add, of course, the image is heavier and we still rely on packages
for most extensions.
The problem is, you’ve got a cluster that is already running and you want, for
example, to test an extension that’s just come out, or you want to deploy it
in production. If that extension is not part of the images that we build, you
have to build your own image. Because of the possible combinations of
extensions that exist, it’s impossible to build all of these combinations. You
could build, for example, a system that allows you to select what extensions
you want and then build the image, but in our way of thinking, this was not
the right approach. And then you’ve got system dependencies and, if an
extension brings a vulnerability that affects the whole image and requires
more updates — not just of the cluster, but also of the builds of the image.
We wanted to do something else, but we immediately faced some limitations of
the technologies. One was on Postgres, the other one was on Kubernetes. In
Postgres, extensions need to be placed in a single folder. It’s not possible
to define multiple locations, but thanks to the work that Peter and this team
have done, now we’ve got extension_control_path in version 18.
Kubernetes could not allow until, 10 days ago, to mount OCI artifacts as
read-only volumes. There’s a new feature that is now part of Kubernetes 1.33
that allows us to do it.
This is the patch that I was talking about, by Peter Eisentraut. I’m
really happy that CloudNativePG is mentioned as one of the use cases.
And there’s also mentioned for the work that, me, David, and Marco and,
primarily Marco and Niccolò from CloudNativePG have done.
This is the patch that introduced VolumeSource in Kubernetes 1.33.
The idea is that with Postgres 18 now we can set in the configuration where we
can look up for extensions in the file system. And then, if there are
libraries, we can also use the existing dynamic_library_path GUC.
So, you remember, this is where we come from [image above]; the good thing is
we have the opportunity to build Postgres images that are minimal, that only
contain Postgres.
Instead of recreating them every week — because it’s very likely that
something has some dependency, has a CVE, and so recreate them for everyone,
forcing everyone to update their Postgres systems — we can now release them
maybe once a month, and pretty much follow the Postgres cadence patch
releases, and maybe if there are CVEs it’s released more frequently.
The other good thing is that now we are working to remove the dependency on
Barman Cloud for CloudNativePG. CloudNativePG has a new plugin interface and
with 1.26 with — which is expected in the next weeks — we are suggesting
people start moving new workloads to the Barman Cloud plugin solution. What
happens is that Barman Cloud will be in that sidecar image. So it will be
distributed separately, and so its lifecycle is independent from the rest. But
the biggest advantage is that any extension in Postgres can be distributed —
right now we’ve got packages — The idea is that they are distributed also as
images.
If we start thinking about this approach, if I write an extension for
Postgres, until now I’ve been building only packages for Debian or for RPM
systems. If I start thinking about also building container images, they could
be immediately used by the new way of CloudNativePG to manage extensions.
That’s my ultimate goal, let’s put it that way.
This is how things will change at run time without breaking immutability.
There will be no more need to think about all the possible combinations of
extensions. There will be the Postgres pod that runs, for example, a primary
or standby, that will have the container for Postgres. If you’re using Barman
Cloud, the sidecar container managed by the plugin with Barman Cloud. And
then, for every extension you have, you will have a different image volume
that is read-only, very light, only containing the files distributed in the
container image of the extension, and that’s all.
Once you’ve got these, we can then coordinate the settings for external
extension_control_path and dynamic_library_path. What we did was, starting
a fail fast pilot project within EDB to test the work that Peter was doing on
the extension_control_path. For that we used the Postgres Trunk Containers
project, which is a very interesting project that we have at
CloudNativePG. Every day it rebuilds the latest snapshot of the master branch
of Postgres so that we are able to catch, at an early stage, problems with the
new version of Postgres in CloudNativePG. But there’s also an action that
builds container images for a specific, for example, Commitfest patch. So we
use that.
Niccolò wrote a pilot patch, an exploratory patch, for the operator to define
the extensions stanza inside the cluster resource. He also built some bare
container images for a few extensions. We make sure to include a very simple
one and the most complex one, which is PostGIS. This is the patch that —
it’s still a draft — and the idea is to have it in the next version, 1.27
for CloudNativePG. This is how it works:
We have the extensions section in the cluster definition. We name the
extension. Theoretically we could also define the version and we point to the
image. What’s missing in this pilot patch is support for image catalogs, but
that’s something else that we can worry about later.
What happens under the hood is that when you update, or when you add a new
extension in the cluster definition, a rolling update is initiated. So there’s
this short downtime, but the container image is loaded in the replicas first,
and then in the primary. n image volume is mounted for each extension in,
let’s say, /extensions/$name_of_extension folder and CNPG updates, these two
parameters. It’s quite clean, quite neat. It works, but most of the work needs
to happen here. So that’s been my call, I mean to call container images as a
first class artifacts. If these changes, we have a new way to distribute
images.
Just to approach the conclusion, if you want to know more about the whole
story, I wrote this blog article that recaps everything, and the key
takeaway for me — and then we go more on the patch if you want to, and also
address the questions. But what is important for me? Being in the Postgres
community for a long time, I think this is a good way, a good moment for us to
challenge the status quo of the extension distribution ecosystem.
I think we have an opportunity now to define a standard, which, I just want to
be clear, I’m focusing myself primarily on CNPG, but this is in general, even
for other operators. I’m sure that this will benefit everyone and overall it
will reduce the waste that we collectively create when distributing these
extensions in Kubernetes. If this becomes a standard way to distribute
extensions, the benefits will be much better operational work for everyone,
primarily also easier testing and validation of extensions. I mean, right now,
if you see an extension, ideally that extension — and it’s very easy to
build — if you’re in GitHub, to build the container images. GitHub, for
example, already provides the whole infrastructure for you to easily build
container images.
So if we find a standard way to define a GitHub action to build Postgres
extensions, I think, if you’re a developer of an extension, you can just use
it and then you find a registry in your project directly that continuously
publishes or periodically publishes this extension. Any user can just
reference that image URL and then without having to build images, they’re just
one rolling update away from testing a patch, testing also the upgrade paths.
I think there are some unknown unknowns that kind of scare me, in general,
about upgrades, upgrades of extensions. This is, in my opinion, one of the
biggest issues. It’s not that they’re not solved, but they require more
attention and more testing if you’re using them in an immutable world. All of
these will, in my opinion, will be much, much better with the approach we’ve
proposed. Images will be lighter, and the lighter image is also safer and more
secure, so less prone to have CVEs,lLess prone to require frequent updates,
and also they reduce the usage of bandwidth, for an organization in general.
What I was saying before, any extension project can be fully independent,
have their own way to build images and publish them.
One last point. I keep hearing many signs, that all of the stuff that we are
proposing right now seem like a kind of a limitation of Kubernetes. The way I
see it, in my view, that it’s not actually a limitation, it’s that these
problems have never been addressed before. The biggest mistake we can do is
focus on the specific problem of managing extensions without analyzing the
benefits that the entire stack brings to an organization. Kubernetes brings a
lot of benefits in terms of security, velocity, change management and,
operations that any organization must consider right now. Any Postgres DBA,
any Postgres user, my advice is, if you haven’t done it yet, start taking
Kubernetes, seriously.
Discussion
Floor: I do think that David, you wanted to talk maybe a little bit about the
mutable volume pattern?
David: Well, if people are interested, in your early slide where you were
looking at alternatives, one you were thinking of was putting extensions on a
mutable volume and you decided not to do that. But at Tembo we did do that and
I did a bunch of work trying to improve it and try to minimize image size and
all that in the last couple months. Tembo Cloud is shutting down now, so I had
to stop before I finished it, but I made quite a bit of progress. I’m happy to
kind of talk through the ideas there. But I think that this approach is a
better long term solution, fundamentally.
Gabriele: I would like if Marco and Niccolò, if you want to talk about the
actual work you’ve done. Meanwhile, Peter asks, “why does an installation of
an extension require a small downtime?” The reason is that at the moment, the
image volume patch, if you add a new image volume, it requires the pod to
restart. Nico or Marco, Jonathan, if you want to correct me on that.
Nico or Marco or Jonathan: It provides a rolling update of the cluster right
now.
Gabriele: So that’s the reason. That’s the only drawback, but the benefits in
my opinion, are…
David: My understanding is that, to add a new extension, it’s mounted it in a
different place. And because every single extension is its own mount, you have
to add it to both those GUCs. And at least one of them requires a restart.
Gabriele: But then for example, we’ve had this conversation at EDB for
example, we’re planning to have flavors of predefined extensions. For
example, you can choose a flavor and we distribute those extensions. For
example, I dunno, for AI we place some AI kind of extensions in the same
image, so it would be different.
But otherwise I’m considering the most extreme case of one extension, one
container image, which in my opinion, for the open source world is the way
that hopefully will happen. Because this way, think about that – I haven’t
mentioned this — if I write an extension, I can then build the image and
then run automated tests using Kubernetes to assess my extension on GitHub. If
those tests fail, my commit will never be merged on main. This is trunk
development, continuous delivery. This is, in my opinion, a far better way of
delivering and developing software. This is, again, the reason why we ended up
in Kubernetes. It’s not because it’s a technology we like, it’s a toy or so,
it’s because it solves bigger problems than database problems.
Even when we talk about databases, there’s still work that needs to be done,
needs to be improved. I’m really happy that we have more people that know
Postgres nowadays that are joining CloudNativePG, and are elevating the
discussions more and more on the database level. Because before it was
primarily on Kubernetes level, but now we see people that know Postgres better
than me get in CloudNativePG and propose new ideas, which is great. Which is
the way it needs to be, in my opinion.
But I remember, Tembo approached us because we actually talked a lot with
them. Jonathan, Marco, I’m sure that you recall, when they were evaluating
different operators and they chose CloudNativePG. I remember we had these
discussions where they asked us to break immutability and we said, “no way”.
That’s why I think Tembo had to do the solution you described, because we
didn’t want to do it upstream.
I think, to be honest, and to be fair, if image volumes were not added, we
would’ve probably gone down that path, because this way of managing
extensions, as I was saying, is not scalable, the current one. Because we want
to always improve, I think that the approach we need to be critical on what we
do. So, I don’t know, Niccolò, Marco, I would like you to, if you want, explain
briefly.
FROMghcr.io/cloudnative-pg/postgresql-trunk:18-develASbuilderUSER0COPY . /tmp/pgvectorRUNset -eux;\
mkdir -p /opt/extension &&\
apt-get update &&\
apt-get install -y --no-install-recommends build-essential clang-16 llvm-16-dev &&\
cd /tmp/pgvector &&\
make clean &&\
make OPTFLAGS=""&&\
make install datadir=/opt/extension/share/ pkglibdir=/opt/extension/lib/FROMscratchCOPY --from=builder /opt/extension/lib/* /lib/COPY --from=builder /opt/extension/share/extension/* /share/
Niccolò: I forked, for example, pgvector, That’s what we can do basically
for every simple extensions that we can just build. This is a bit more
complicated because we have to build from a trunk version of Postgres 18. So
we have to compile pgvector from source, and then in a scratch layer we just
archive the libraries and every other content that was previously built. But
ideally whenever PG 18 comes out as a stable version of Postgres, we just need
to apt install pgvector and grab the files from the path. Where it gets a
bit more tricky is in the case of PostGIS, or TimescaleDB, or any
extension whose library requires third party libraries. For example, PostGIS
has a strong requirement on the geometric libraries, so you need to import
them as well inside the mount volume. I can link you an example of the
PostGIS one.
Gabriele: I think it’s important, we’ve got, I think Peter here, David as
well, I mean, for example, if we could get standard ways in Postgres to
generate Dockerfiles for extensions, that could be great. And as I said,
these extensions can be used by any operator, not only CNPG.
David: That’s my POC does. It’s a patch against the PGXS that would build a
trunk image.
Gabriele: This is the work that Niccolò had to do to make PostGIS work in the
pilot project: he had to copy everything.
Niccolò: I think we can make it a little bit smoother and dynamically figure
out everything from the policies library, so we don’t have to code everything
like this, but this is just a proof of concept that it can
work.
David: So you installed all those shared libraries that were from packages.
Niccolò: Yeah, they’re being copied in the same MountVolume where the actual
extensions are copied as well. And then the pilot patch is able to set up the
library path inside the pod so that it makes the libraries available to the
system because of course, these libraries are only part of the MountVolume.
They’re not injected inside the system libraries of the pod, so we have to set
up the library path to make them available to Postgres. That’s how we’re able
to use them.
David: So they end up in PKGLIBDIR but they still work.
Niccolò: Yeah.
Gabriele: I mean, there’s better ideas, better ways. As Niccolò also said, it
was a concept.
David: Probably a lot of these shared libraries could be shared with other
extensions. So you might actually want other OCI images that just have some of
the libraries that shared between.
Gabriele: Yeah, absolutely. So we could work on a special kind of, extensions
or even metadatas so that we can place, you know…
So, yeah, that’s it.
Jonathan: I think it’s important to invite everyone to try and test this,
especially the Postgres trunk containers, when they want to try something
new stuff, new like this one, just because we always need people testing. When
more people review and test, it’s amazing. Because every time we release
something, probably we’ll miss something, some extension like PostGIS missing
one of the libraries that wasn’t included in the path. Even if we can try to
find a way to include it, it will not be there. So testing, please! Test all
the time!
Gabriele: Well, we’ve got this action now, they’re failing. I mean, it’s a bit
embarrassing. [Cross talk.] We already have patch to fix it.
But I mean, this is a great project as I mentioned before, because it allows
us to test the current version of Postgres, but also if you want to build from
a Commitfest or if you’ve got your own Postgres repository with sources, you
can compile, you can get the images from using this project.
Floor: Gabriele, did you want to talk about SBOMs?
Gabriele: I forgot to mention Software Bill of Materials. They’re very
important. It’s kind of now basic for any container image. There’s also the
possibility to add them to these container images too. This is very important.
Again, in a change manager for security and all of that — in general supply
chain. And signatures too. But we’ve got signature for packages as well.
There’s also a attestation of provenance.
For a side project, I’m converting JSON inputs to CBOR, or Concise Binary
Object Representation, defined by RFC 8949, in order to store a more compact
representation in the database. This go Go app uses encoding/json package’s
UseNumber decoding option to preserve numbers as strings, rather tha
float64s. Alas, CBOR has no support for such a feature, so such values
cannot survive a round-trip to CBOR and back, as demonstrating by this example
using the github.com/fxamacker/cbor package (playground)
// Decode JSON number using json.Number.input:=bytes.NewReader([]byte(`{"temp": 98.6}`))dec:=json.NewDecoder(input)dec.UseNumber()varvalmap[string]anyiferr:=dec.Decode(&val);err!=nil{log.Fatalf("Err: %v",err)}// Encode as CBOR.data,err:=cbor.Marshal(val)iferr!=nil{log.Fatalf("Err: %v",err)}// Decode back into Go.varnewValmap[string]anyiferr:=cbor.Unmarshal(data,&newVal);err!=nil{log.Fatalf("Err: %v",err)}// Encode as JSON.output,err:=json.Marshal(newVal)iferr!=nil{log.Fatalf("Err: %v",err)}fmt.Printf("%s\n",output)
The output:
{"temp":"98.6"}
Note that the input on line 2 contains the number 98.6, but once the value
has been transformed to CBOR and back it becomes the string "98.6".
I wanted to preserve JSON numbers treated as strings. Fortunately, CBOR uses
numeric tags to identify data types, and includes a registry maintained by
IANA. I proposed a new tag for JSON numbers as strings and, through a few
iterations, the CBOR group graciously accepted the formal description of
semantics and assigned tag 284 in the registry.
Now any system that handles JSON numbers as strings can use this tag to
preserve the numeric representation in JSON output.
// Create tag 284 for JSON Number as string.tags:=cbor.NewTagSet()tags.Add(cbor.TagOptions{EncTag:cbor.EncTagRequired,DecTag:cbor.DecTagRequired,},reflect.TypeOf(json.Number("")),284,)// Create a custom CBOR encoder and decoder:em,_:=cbor.EncOptions{}.EncModeWithTags(tags)dm,_:=cbor.DecOptions{DefaultMapType:reflect.TypeOf(map[string]any(nil)),}.DecModeWithTags(tags)// Decode JSON number using json.Number.input:=bytes.NewReader([]byte(`{"temp": 98.6}`))dec:=json.NewDecoder(input)dec.UseNumber()varvalmap[string]anyiferr:=dec.Decode(&val);err!=nil{log.Fatalf("Err: %v",err)}// Encode as CBOR.data,err:=em.Marshal(val)iferr!=nil{log.Fatalf("Err: %v",err)}// Decode back into Go.varnewValmap[string]anyiferr:=dm.Unmarshal(data,&newVal);err!=nil{log.Fatalf("Err: %v",err)}// Encode as JSON.output,err:=json.Marshal(newVal)iferr!=nil{log.Fatalf("Err: %v",err)}fmt.Printf("%s\n",output)
Lines 1-16 contain the main difference from the previous example. They create
a CBOR encoder (em) and decoder (dm) with tag 284 assigned to
json.Number values. The code then uses them rather than the cbor package
to Marshal and Unmarshal the values on lines 28 and 35. The result:
{"temp":98.6}
Et voilà! json.Number values are once again preserved.
I believe these custom CBOR encoder and decoder configurations bring full
round-trip compatibility to any regular JSON value decoded by encoding/json.
The other important config for that compatibility is the DefaultMapType
decoding option on line 15, which ensures maps use string values for map
keys rather the CBOR-default any values.
I’m pleased to welcome Mankirat Singh to the Postgres community as a
2025 Google Summer of Code contributor. Mankirat will be developing an ABI
compliance checker and reporting system to help identify and prevent
unintentional ABI changes in future minor Postgres releases. This follows on
the heels of the addition of ABI and API guidance in Postgres 18, as well as
the ABI-breaking Postgres 17.1 release. What timing!
Please follow Mankirat’s blog as he develops the project this summer, under
the mentorship of myself and Pavlo Golub. It should also soon be on Planet
PostgreSQL. We’ve also set up the #gsoc2025-abi-compliance-checker channel
on the community Slack for ad-hoc discussion. Join us!
The last Extension Ecosystem Mini-Summit is upon us. How did that happen?
Join us for a virtual conference session featuring Gabriele Bartolini, who
will be discussing Extension Management in CNPG. I’m psyched for this one,
as the PostgresSQL community has contributed quite a lot to improving
extensions management in CloudNativePG in the past year, some of which we
covered in previously. If you miss it, the video, slides, and transcript
will appear here soon.
Though it may be a week or two to get the transcripts done, considering that
PGConf.dev is next week, and featuring the Extension Ecosystem Summit on
Tuesday, 13 May in Montreál, CA. Hope to see you there; be sure to say “hi!”