Skip to content

Fix vendored ncbi-vdb publishing so crates.io installs work without --no-verify #15

@nh13

Description

@nh13

Context

v0.1.0 of fg-sra-vdb-sys, fg-sra-vdb, and fg-sra was published to crates.io manually with --no-verify purely to park the namespace. The automated publish workflow introduced in #14 also uses --no-verify as a workaround. This means:

  • cargo install fg-sra (with the default vendored feature) currently fails for anyone installing from crates.io with an error like vendor/ncbi-vdb not found; did you initialize the git submodule?.
  • Users can install via the bioconda recipe (see Add fg-sra bioconda/bioconda-recipes#64283) or by cloning the repo with submodules and running cargo install --path crates/fg-sra — but not from crates.io.

This issue tracks the work required to ship a functional release via crates.io.

Root causes

  1. Submodule location. The vendor/ncbi-vdb git submodule lives at the workspace root, outside the fg-sra-vdb-sys crate directory. cargo publish only packages files inside the crate, so the submodule source is not included in the tarball. When cargo's verification step (or a downstream cargo install) unpacks the tarball and runs build.rs, the vendored code path panics because vendor/ncbi-vdb does not exist.

  2. Tarball size. The full vendor/ncbi-vdb git archive is ~19 MB gzipped — above the crates.io 10 MB per-crate limit. Pruning vendor/ncbi-vdb/test/ (~22 MB uncompressed) and vendor/ncbi-vdb/py_vdb/ (~188 KB) brings the gzipped payload to ~5.3 MB, well under the limit.

  3. Out-of-tree cmake build. Even when the submodule is available inside the crate directory, ncbi-vdb's CMake configure step drops an a.out file into its source tree (probably from a try_compile probe). Cargo flags this as a build-script side effect with:

    Source directory was modified by build.rs during cargo publish. Build scripts should not modify anything outside of OUT_DIR.

Options

Any one of these would unblock a functional crates.io release:

  • (a) Relocate the submodule. git mv vendor/ncbi-vdb crates/fg-sra-vdb-sys/vendor/ncbi-vdb, update .gitmodules and build.rs, and add exclude = [\"vendor/ncbi-vdb/test/**\", \"vendor/ncbi-vdb/py_vdb/**\", \"vendor/ncbi-vdb/.git*\"] to crates/fg-sra-vdb-sys/Cargo.toml. Also fix the out-of-tree build issue (see below) so we can drop --no-verify. Local testing confirmed the pruned tarball is ~5.2 MB, 1707 files, well under the 10 MB limit.

  • (b) Copy the vendored source into OUT_DIR before invoking cmake. Leaves the submodule where it is, but the build_ncbi_vdb function first copies vendor/ncbi-vdb into $OUT_DIR/ncbi-vdb and runs cmake there. This also fixes the a.out-in-source-tree problem since nothing writes to the original source tree anymore. Downside: extra copy on every build.

  • (c) Drop the vendored feature from the published crates. Flip fg-sra's default features so vendored is off by default, and document that crates.io users must pre-install ncbi-vdb and set VDB_INCDIR / VDB_LIBDIR. Simplest, worst UX — but the bioconda recipe already works this way, so crates.io would just be "for people who know what they're doing."

  • (d) Switch to dynamic linking. build.rs currently does rustc-link-lib=static=ncbi-vdb. Supporting rustc-link-lib=dylib=ncbi-vdb (either automatically detected from the lib dir or controlled by a VDB_LINK_TYPE env var) is orthogonal to the vendored/submodule problem but worth considering while we're in here. Not strictly required — bioconda's ncbi-vdb package actually ships libncbi-vdb.a at $PREFIX/lib/libncbi-vdb.a, so static linking already works.

My recommendation is (a) + fix the cmake out-of-tree issue — this is the "do it once, forget about it" path and the one I have the most signal on (the relocation has been exercised locally end-to-end except for the a.out problem, which the copy-to-OUT_DIR trick would also solve if baked into the relocation PR).

Acceptance criteria

  • cargo publish -p fg-sra-vdb-sys succeeds without --no-verify on a release branch
  • cargo publish -p fg-sra-vdb succeeds without --no-verify
  • cargo publish -p fg-sra succeeds without --no-verify
  • cargo install fg-sra from a fresh cache pulls from crates.io and builds successfully (with CMake available on the host)
  • --no-verify flags are removed from .github/workflows/publish.yml (undoes that part of ci: automate crates.io publishing via trusted publishing #14)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions