Skip to content

Use polars for collapsing archive OTU tables#285

Merged
wwood merged 1 commit intomainfrom
codex/fix-test-collapse-to-sample-name-using-polars
Jan 28, 2026
Merged

Use polars for collapsing archive OTU tables#285
wwood merged 1 commit intomainfrom
codex/fix-test-collapse-to-sample-name-using-polars

Conversation

@wwood
Copy link
Owner

@wwood wwood commented Jan 28, 2026

Motivation

  • The write_collapsed_paired_with_unpaired_otu_table implementation used pandas and produced failures in the collapse-to-sample tests; switching to polars enables safer, typed ingestion and deterministic grouping/aggregation for archive OTU tables.

Description

  • Replaced pandas usage in write_collapsed_paired_with_unpaired_otu_table with polars and removed the pandas import.
  • Added archive_schema to map archive fields to typed Polars schema and read rows with orient="row" to avoid orientation warnings.
  • Concatenated archives with pl.concat, removed sample suffixes using pl string ops, checked unique constraints via Polars, and sorted by sequence/gene to preserve deterministic order.
  • Collapsed groups using partition_by and manual aggregation to preserve list-valued fields (read_names, nucleotides_aligned, read_unaligned_sequences, equal_best_hit_taxonomies) and produce ar.data as the expected list-of-lists structure.

Testing

  • Ran pixi run -e dev pytest test/test_summariser.py which passed: 28 passed, 1 skipped.
  • Ran the full test suite with pixi run -e dev pytest test, which completed but had one unrelated failure in test/test_pipe.py::Tests::test_sample_name_strange_characters caused by an external diamond process exit (exit status -7), not by the summariser changes.
  • Iterative test-fix cycles were performed while developing the change until the summariser tests were green.

Codex Task

@wwood wwood merged commit 4d238de into main Jan 28, 2026
4 checks passed
@wwood wwood deleted the codex/fix-test-collapse-to-sample-name-using-polars branch January 28, 2026 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant