Skip to content

Conversation

@wwood
Copy link
Owner

@wwood wwood commented Jan 7, 2026

Motivation

  • Provide an option to export protein sequences that matched HMM hits during supplement so users can inspect matched proteins alongside the matched transcripts.
  • Keep the same read renaming convention using the delimiter so sequence identifiers remain compatible with downstream code.

Description

  • Added a new CLI argument --output-matched-protein-sequences in singlem/main.py and passed it into the supplement workflow as output_matched_protein_sequences.
  • Propagated the argument through Supplementor.supplement, updated generate_new_metapackage and gather_hmmsearch_results signatures, and passed the value into worker calls.
  • Implemented writing of matched protein sequences inside run_hmmsearch_on_one_genome by appending protein FASTA entries for transcripts in matched_transcript_ids and pre-creating/clearing the output file in gather_hmmsearch_results when requested.

Testing

  • Ran the full test suite with pixi run -e dev pytest test as required by project docs.
  • Test run collected 244 items and completed with 2 failed, 222 passed, 20 skipped, 1 warning.
  • The two failing tests are test/test_pipe.py::Tests::test_read_chunk_size_forward_gzip and test/test_pipe.py::Tests::test_read_chunk_size_paired, which failed due to a --read-chunk-size must be divisible by 4 validation error.
  • All tests unrelated to supplement changes passed.

Codex Task

@wwood wwood merged commit febad72 into main Jan 8, 2026
0 of 4 checks passed
@wwood wwood deleted the codex/add-output-matched-protein-sequences-option branch January 8, 2026 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant