From 1a778ac2abf05385a031d7ad4fc844452511768a Mon Sep 17 00:00:00 2001 From: "David E. Wheeler" Date: Thu, 11 Jul 2024 17:03:50 -0400 Subject: [PATCH 1/3] RFC: Binary Distribution Format Add a new RFC describing the proposed trunk binary distribution format for PGXN packages. Inspired by Python wheel and pgt.dev, aiming to support binaries for every OS and architecture supported by PostgreSQL itself, as well as many versions of PostgreSQL. --- text/0002-binary-distribution-format.md | 368 ++++++++++++++++++++++++ 1 file changed, 368 insertions(+) create mode 100644 text/0002-binary-distribution-format.md diff --git a/text/0002-binary-distribution-format.md b/text/0002-binary-distribution-format.md new file mode 100644 index 0000000..cf2ac2e --- /dev/null +++ b/text/0002-binary-distribution-format.md @@ -0,0 +1,368 @@ +{{#title PGXN RFC–2 — Binary Distribution Format}} +* **RFC:** 2 +* **Title:** Binary Distribution Format +* **Slug:** `binary-distribution-format` +* **Start Date:** 2024-06-18 +* **Status:** Proposed Standard +* **Category:** Packaging +* **Pull Request:** [pgxn/rfcs#2](https://github.com/pgxn/rfcs/pull/2) +* **Implementation Issue:** TBD + +# RFC--2 --- Binary Distribution Format + +## Abstract + +This RFC specifies the binary distribution format for [PGXN] packages, also +called the trunk format.[^wheel] A trunk is a ZIP-format archive with a +specially formatted file name and the `.trunk` extension. It contains a single +distribution nearly as it would be installed by [PGXS]. Although a specialized +installer is recommended, a trunk file may be installed by simply copying +directories of files to destinations defined by [pg_config]. + +## Introduction + +Currently [PGXN] distributes only source code packages. Users wishing to +install and use PGXN distributions must install build tools, including `make`, +a compiler, and PostgreSQL development packages; then download, compile, and +install the distribution. Many users do not have the expertise to follow these +steps. Those wishing to use extension in a production environment may not wish +to include a compiler and tooling, let alone perform compilation, on a +production host, and so must find an appropriate binary package or else create +their own. + +The proposed binary distribution format, or "trunk", aims to provide +pre-compiled PGXN distributions in a format that's straightforward to download +and install in directories defined by [pg_config]. This format will serve as a +building block for building comprehensive extension packaging for multiple +versions of PostgreSQL, CPU architectures, and --- unlike other packaging +systems, --- a diversity of operating systems, including Linux, macOS, various +BSDs, and Windows. + +## Guide-level explanation + +TODO. + + + +## File Format + +### File name convention + +The trunk filename is: + +``` +{package}-{version}+{pg}-{platform}.trunk +``` + +Definition of variables: + +* `package`: Package name, e.g. `pgmq`, `postgis`, `pgAdmin`, `pg_top`. +* `version`: Distribution version in [SemVer] format without build metadata, + e.g., `0.8.6` or `1.0.0-beta`. +* `pg`: Major version of Postgres the binary was built against, e.g., + `pg15`, `pg16`. +* `platform`: The platform the binary was built for. Will be made up of one + to three hyphen-delimited[^hyphen] values for the OS, version + information[^PEPs], and CPU architecture. Examples: `any`, + `gnulinux-amd64`, `darwin-23.5.0.arm64`, `musllinux-1.2-amd64v3`. The + allowed values will be defined one or more separate RFCs. + +#### Examples: + +* `pgtap-1.0.1+pg15-any.trunk` packages `pgtap` version 1.9.1, compatible + with Postgres 15 (any minor release) on any platform. +* `pair-0.32.1+pg16-gnulinux-amd64.trunk` packages `pair` version 0.32.1, + compatible with Postgres 16 (any minor release) on GNU libc-based Linux + for amd64 CPUs. +* `pair-0.32.1+pg16-darwin-23.5.0-arm64.trunk` packages `pair` version + 0.32.1, compatible with Postgres 16 (any minor release) on Darwin version + 23.5.0 (macOS) for arm64 CPUs. + +#### Escaping and Unicode + +The `+` in the file name indicates the division between the package name and +version and the package metadata. The package name and version must not +include a `+`. This allows the file name, without the `.trunk` extension, to +also function as a valid [SemVer]. + +Tools producing trunks should verify that the filename components do not +contain `+`, as the resulting file may not be processed correctly if it does. + +The package name should be lowercase and the file name components should all +be UTF-8. + +The filenames *inside* the archive are encoded as UTF-8. Although some ZIP +clients in common use do not properly display UTF-8 filenames, the encoding is +supported by the ZIP specification. + +#### Parsing + +Parsing of the file name takes place in four parts: + +1. For the file name, remove the `.trunk` extension. If working with the + directory name (prefix) extracted from the archive, there will be no + `.trunk` extension. + +2. Split the name into two parts at the `+` sign. The left part is the + package name and [SemVer]. The right part is the platform specification. + +3. For the left part, split on the right-most dash. If the string to the + right of the dash is a valid [SemVer], then the left part is the package + name. If the right string is not a valid [SemVer], try again at the second + right-most dash and check again. Continue until a valid SemVer is produced + or else fail. + +4. Split the right string on dashes. There will be between two and four + values as follows: + + * Two: the postgres version (`pg16`) and `any`. + * Three: the postgres version (`pg16`), the OS (`gnulinux`, `darwin`, + etc.), and the architecture (`amd64`, `arm64`, etc.) + * Four: the postgres version (`pg16`), the OS (`gnulinux`, `darwin`, + etc.), the OS version (`23.5.0`) and the architecture (`amd64`, + `arm64`, etc.) + +##### Examples: + +* `pgtap-1.0.1+pg15-any` + * Package: `pgtap` + * Version: `1.0.1` + * Postgres: `pg15` + * Platform: `any` +* `pair-0.32.1-beta1+pg16-gnulinux-amd64` + * Package: `pair` + * Version: `0.32.1-beta1` + * Postgres: `pg16` + * OS: `gnulinux` + * Architecture: `amd64` +* `pair-0.32.1+pg16-darwin-23.5.0-arm64` + * Package: `pair` + * Version: `0.32.1` + * Postgres: `pg16` + * OS: `darin` + * OS Version: `23.5.0` + * Architecture: `arm64` + +### File contents + +The contents of a trunk file should unpack into a directory with the same name +as the file, but without the `.trunk` extension. The contents of the directory +are: + +* `trunk.json` contains metadata necessary to install the extension. The + format wil be subject to a future RFC, but at a minimum will include the + trunk format version, package version, dependencies, license, language and + runtime (e.g., libc implementation and version), platform metadata, and + Postgres version and build configuration. Trunk installers should warn if + the trunk version is greater than the version it supports, and must fail + if the Trunk version has a greater major version than the version it + supports. + +* `digests` contains a list of (almost) all the files in the trunk and their + secure hashes. Each line lists a single file and its checksum in the [BSD + digest format]: `{algorithm} ({filename}) = {checksum}`. Every file except + `digests` --- which cannot contain a hash of itself --- must be listed in + this file. The cryptographic hash algorithm must be [SHA-256] or better; + specifically, MD5 and SHA-1 are not permitted, as signed trunk files rely + on the strong hashes in `digests` to validate the integrity of the + archive. + +* The `pgsql` directory contains one or more subdirectories named for + `pg_config` directory configurations: `bin`, `doc`, `html`, `include`, + `pkginclude`, `lib`, `pkglib`, `locale`, `man`, `share`, and `sysconf`. + Each contains the files to be installed in the corresponding `pg_config` + directory. + +* Dynamic language scripts must appear in `pgsql/bin` and begin with exactly + `'#!{cmd}`, where `cmd` is the name of the interpreter, in order to enjoy + script wrapper generation and shebang rewriting at install time. They must + have no extension. The list of supported interpreters will depend on the + features of the installer, but one can reasonably expect support for + [Perl], [Python], and [Ruby]. If no appropriate instance of the given + interpreter is present, the installer may abort the installation. + +* `README`, `LICENSE`, and `CHANGELOG` may optionally be in the directory. + Each must be plain text or Markdown-formatted. In the latter case, they + may use the extension `.md`. + +* Trunk, being an installation format intended to install pre-compiled + binaries and supporting files, does not include a `Makefile`, `configure` + file or any other tool for building the package contents. + +During extraction, trunk installers verify all the hashes in `digests` against +the file contents. Apart from `digests` and its signatures, installation will +fail if any file in the archive is not both mentioned and correctly hashed in +`digests`. + +## Details + +### Installing a Trunk + +The following descriptions will use a trunk file named +`pair-0.32.1+pg16-gnulinux-amd64.trunk`. Trunk installation notionally +consists of two phases: + +1. Unpack + * Validate digests. Ensure that every file, aside from `digests` itself, + is listed in `digest` along with it valid hash digest. If any file is + missing or has an invalid digest, installation should fail. If a file + listed in `digests` is not present, installation should fail. + * Parse the `trunk.json` file. Check that the distribution is compatible + with: + * The trunk format version + * The platform (OS, OS version, and architecture); `any` is allowed + for any platform + * The PostgreSQL version +2. Install + * If applicable, update scripts starting with `#!{cmd}` to point to the + correct interpreter. Fail if no such interpreter is present. + * Iterate over each subdirectory of the `pgsql` directory. + * If the directory corresponds to a directory configuration from + [pg_config], install its contents in that target directory. + +## Drawbacks + +Many PostgreSQL extensions and applications are already distributed via +well-tested and -maintained packaging systems, including the community [Yum] +and [Apt] repositories. + +However, these systems serve a limited number of OSes; macOS and Windows, +while served by their own packaging systems ([Homebrew] and [Chocolatey], +among others), have access to fewer packages and are less integrated into +community package distribution. + +[PGXN] aims to be the canonical repository for all publicly-available +extensions, and to provide as many of them as possible in the same binary +format to a variety of OSes. The trunk format is a key component for realizing +that vision. + +## Rationale and alternatives + +This design is ideally suited to PostgreSQL extensions because it's built +around the installation and configuration options provided by [pg_config]. +This short list of directories into which to install appropriate distribution +files is universal across OSes, and therefore suitable for distributing +binaries for, ultimately, every OS supported by PostgreSQL itself. + +The alternatives available today include: + +* The community [Yum] and [Apt] repositories, which serve only Linux + systems and require separate packages tied to the file layouts of those + systems. The trunk format is OS-agnostic and provides files for any Linux + distribution, regardless of the location of the PostgreSQL + installation(s) on the file system. +* [PGXMan] supports only Debian and Ubuntu Linux systems, and being + downstream of the community [Apt] packages, is also dependent on its file + layouts. Plans for macOS support have been promised, but the project + has seen [little activity] in 2024. +* [Trunk] inspired the design documented here, and from which it takes its + name. That format is limited to a few file types, and lacks support for + multiple OSes and architectures. This RFC may be considered an evolution + of that format. +* [StackBuilder] has little visibility or penetration beyond [EDB] Windows + customers. I am unable to find a public list of available extensions or a + description of the packaging format or how to contribute to it. + +Without the trunk binary distribution format, it will be difficult to build +and deliver cross-platform binary distribution of all the packages on PGXN. + +## Prior art + +The design of the trunk binary distribution format is inspired by the original +[Trunk] format, which demonstrated a pattern for distributing extensions +agnostic of file locations. This design may be considered an evolution of the +[Trunk] registry format. + +The design was also heavily inspired by the [Python wheel] format. Although +locations for installable files in the trunk format relate directly to +[pg_config] directories, most of the other aspects of the design were borrowed +from wheel, including the `digests` file and the `trunk.json` metadata file. + +## Unresolved questions + +* Should the archive format be Zip or tarball? PGXN had traditionally used + Zip, since it's supported everywhere, including Windows. So does the + [Python Wheel] format. But many other packaging systems use tarballs, + including [Homebrew] and [OCI]. The emerging idea to [distribute trunks + via OCI registries] may favor tarballs. +* The list of platforms to support and the strings to indicate them, + including CPU alternatives, will be defined in a forthcoming RFC. + +## Future possibilities + +Some other ideas for the format, in either the short or long term: + +* Adopt the [Python wheel signing pattern] +* Include an [SPDX SBOM](https://spdx.dev)? +* Allow non-postgres libraries to be included, such as OS dependencies, + either in the appropriate `pgsql` subdirectory or perhaps in a separate + `sys` directory + +## References + +* [Python Binary distribution format][Python wheel] +* [trunk POC] +* [Previous discussion] + + [^wheel]: With much inspiration and from and gratitude to the [Python wheel] + format. + [^hyphen]: Why hyphens? They allow the entire file name, between the package + name and the `.trunk` extension, to be a valid [SemVer]. + [^PEPs]: See for example [PEP 600] defining Python wheel tags for different + versions of GNU libc and [PEP 656] defining tags for different versions of + musl libc. See also how [Homebrew] uses [macOS version names] in file + names for its packages. + + [PGXN]: https://pgxn.org "PostgreSQL Extension Network" + [PGXS]: https://www.postgresql.org/docs/current/extend-pgxs.html + "PostgreSQL Docs: Extension Building Infrastructure" + [pg_config]: https://www.postgresql.org/docs/current/app-pgconfig.html + "PostgreSQL Docs: pg_config" + [Python wheel]: https://packaging.python.org/en/latest/specifications/binary-distribution-format/ + [SemVer]: https://semver.org "Semantic Versioning 2.0.0" + [PEP 600]: https://peps.python.org/pep-0600/ + "PEP 600 – Future ‘manylinux’ Platform Tags for Portable Linux Built Distributions" + [PEP 656]: https://peps.python.org/pep-0656/ + "PEP 656 – Platform Tag for Linux Distributions Using Musl" + [Homebrew]: https://brew.sh "Homebrew: The Missing Package Manager for macOS (or Linux)" + [macOS version names]: https://github.com/oras-project/oras/issues/237#issuecomment-815250008 + "oras-project/oras#237 Comment from sjackman" + [BSD digest format]: https://stackoverflow.com/q/1299833/79202 + [SHA-256]: https://en.wikipedia.org/wiki/SHA-2 "Wikipedia: SHA-2" + [Perl]: https://perl.org "The Perl Programming Language" + [Python]: https://python.org "The Python Programming Language" + [Ruby]: https://ruby-lang.org/en/ "The Ruby Programming Language" + [Yum]: https://yum.postgresql.org "PostgreSQL Yum Repository" + [Apt]: https://wiki.postgresql.org/wiki/Apt "PostgreSQL packages for Debian and Ubuntu" + [Homebrew]: https://brew.sh "The Missing Package Manager for macOS (or Linux)" + [Chocolatey]: https://chocolatey.org "The Package Manager for Windows" + [PGXMan]: https://pgxman.com "npm for PostgreSQL" + [little activity]: https://github.com/pgxman/buildkit/commits/main/?since=2024-01-01&until=2024-07-11 + [Trunk]: https://pgt.dev "Trunk is an open-source package installer and registry for PostgreSQL extensions" + [StackBuilder]: https://www.enterprisedb.com/docs/supported-open-source/postgresql/installing/using_stackbuilder/ + [EDB]: https://www.enterprisedb.com "Enterprise DB" + [OCI]: https://github.com/opencontainers/image-spec/blob/main/media-types.md + "OCI Image Media Types" + [distribute trunks via OCI registries]: https://justatheory.com/2024/06/trunk-oci-poc/ + "POC: Distributing Trunk Binaries via OCI" + [Python wheel signing pattern]: https://packaging.python.org/en/latest/specifications/binary-distribution-format/#signed-wheel-files + "Python Binary distribution format: Signed wheel files" + [trunk POC]: https://gist.github.com/theory/7dc164e5772cae652d838a1c508972ae + "trunk POC using PGXS, bash, tar, shasum, and jq" + [Previous discussion]: https://github.com/orgs/pgxn/discussions/2 + "Proposal: Binary Distribution Format" From 93e353a271918b03572ef86f0054da0afe0f5bba Mon Sep 17 00:00:00 2001 From: "David E. Wheeler" Date: Fri, 12 Jul 2024 17:54:50 -0400 Subject: [PATCH 2/3] Simplify package name and semver parsing Made possible by RFC-3 disallowing digits after dashes in package names. --- text/0002-binary-distribution-format.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/text/0002-binary-distribution-format.md b/text/0002-binary-distribution-format.md index cf2ac2e..b35f6ef 100644 --- a/text/0002-binary-distribution-format.md +++ b/text/0002-binary-distribution-format.md @@ -121,11 +121,9 @@ Parsing of the file name takes place in four parts: 2. Split the name into two parts at the `+` sign. The left part is the package name and [SemVer]. The right part is the platform specification. -3. For the left part, split on the right-most dash. If the string to the - right of the dash is a valid [SemVer], then the left part is the package - name. If the right string is not a valid [SemVer], try again at the second - right-most dash and check again. Continue until a valid SemVer is produced - or else fail. +3. For the left part, read the package name, including dashes (`-`), until a + digit follows a dash. This dash indicates the end of the package name and + the start of the [SemVer]. 4. Split the right string on dashes. There will be between two and four values as follows: From e7cfa52dae3cca23f544446828e51b307ec4eba4 Mon Sep 17 00:00:00 2001 From: "David E. Wheeler" Date: Wed, 17 Jul 2024 14:55:38 -0400 Subject: [PATCH 3/3] Simplify package name/version parsing Made possible by forbidding dots (.) in Terms in the metatdata spec (80702c3), making it impossible for a package name to include a semver. --- text/0002-binary-distribution-format.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/text/0002-binary-distribution-format.md b/text/0002-binary-distribution-format.md index b35f6ef..d316831 100644 --- a/text/0002-binary-distribution-format.md +++ b/text/0002-binary-distribution-format.md @@ -122,8 +122,9 @@ Parsing of the file name takes place in four parts: package name and [SemVer]. The right part is the platform specification. 3. For the left part, read the package name, including dashes (`-`), until a - digit follows a dash. This dash indicates the end of the package name and - the start of the [SemVer]. + dash is followed by one or more digits and a dot (`.`). This dash + indicates the end of the package name (which cannot contains dots) and the + start of the [SemVer]. 4. Split the right string on dashes. There will be between two and four values as follows: