feat: add rewrite_fasta tool by ameynert · Pull Request #5 · fg-labs/divref-wf

ameynert · 2026-04-10T22:58:30Z

Summary

Ports rewrite_fasta.py from human-diversity-reference/scripts as a defopt-compatible toolkit tool
Filters a FASTA file to retain only canonical chromosomes (chr1–22, X, Y, MT)
Fixes a potential UnboundLocalError in the original by initialising keep=False before the loop
Removes the unused header_line variable from the original

Test plan

uv run --directory divref poe check-all passes

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added FASTA filtering tool to retain only canonical chromosome contigs (chr1–chr22, chrX, chrY, chrMT) while removing non-canonical sequences.
Tests
- Added comprehensive test coverage for FASTA filtering functionality.

coderabbitai · 2026-04-10T22:58:40Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f97523b4-6a4c-4a4a-8c88-069f4d0b9731

📥 Commits

Reviewing files that changed from the base of the PR and between df39e5e and d07c2c6.

📒 Files selected for processing (3)

.gitignore
divref/divref/tools/rewrite_fasta.py
divref/tests/tools/test_rewrite_fasta.py

✅ Files skipped from review due to trivial changes (3)

.gitignore
divref/divref/tools/rewrite_fasta.py
divref/tests/tools/test_rewrite_fasta.py

📝 Walkthrough

Walkthrough

Updated .gitignore to narrow the ignore pattern from **/claude to **/.claude. Added a new rewrite_fasta() tool that filters FASTA files to retain only canonical chromosome contigs (chr1–chr22, chrX, chrY, chrMT), with comprehensive test coverage.

Changes

Cohort / File(s)	Summary
Configuration `.gitignore`	Narrowed ignore pattern to target `.claude` (dot-prefixed) instead of `claude` entries.
FASTA Filtering Tool `divref/divref/tools/rewrite_fasta.py`	New module implementing `rewrite_fasta()` function that streams input FASTA line-by-line, builds canonical contig set, and writes only headers and sequences matching canonical chromosome names to output.
Tool Tests `divref/tests/tools/test_rewrite_fasta.py`	Comprehensive test suite validating FASTA filtering behavior: canonical contig retention, non-canonical contig filtering, mixed input handling, empty input, and multiline sequence preservation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 With whiskers twitching, I filtered the DNA,
Keeping chromosomes canonical, tossing the gray,
Twenty-three keepers (plus X, Y, and MT divine),
Tests passed with a hop—the FASTA now shines! 🧬✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add rewrite_fasta tool' accurately describes the main change: introducing a new rewrite_fasta tool module with accompanying tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch am_04_rewrite_fasta

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

divref/tests/conftest.py (1)
24-26: Consider adding a brief docstring for the datadir fixture.

Per coding guidelines, public functions/fixtures benefit from docstrings explaining their purpose.
📝 Proposed docstring
 `@pytest.fixture`
 def datadir() -> Path:
+    """Return the path to the test data directory."""
     return Path(__file__).parent / "data"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/conftest.py` around lines 24 - 26, Add a brief docstring to the
datadir pytest fixture that describes its purpose and return value: explain that
datadir returns a pathlib.Path pointing to the tests "data" directory adjacent
to conftest.py so tests can load test data files; place the docstring
immediately below the def datadir(...) signature.
divref/tests/tools/test_rewrite_fasta.py (1)
1-53: Consider adding an error-case test for missing input file.

Per coding guidelines, new public functions require "at least one happy-path test + one error case." The happy-path coverage is excellent, but there's no test verifying behavior when the input file doesn't exist (expected: FileNotFoundError).
🧪 Suggested error case test
def test_raises_on_missing_input(tmp_path: Path) -> None:
    with pytest.raises(FileNotFoundError):
        rewrite_fasta(fasta_path=tmp_path / "nonexistent.fa", output_path=tmp_path / "out.fa")
As per coding guidelines: "New public functions require at least one happy-path test + one error case."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/tools/test_rewrite_fasta.py` around lines 1 - 53, Add an
error-case test that asserts rewrite_fasta raises FileNotFoundError when the
input FASTA is missing: create a new test function (e.g.,
test_raises_on_missing_input) that calls rewrite_fasta(fasta_path=tmp_path /
"nonexistent.fa", output_path=tmp_path / "out.fa") inside a
pytest.raises(FileNotFoundError) context using the existing tmp_path fixture;
reference the rewrite_fasta function to locate where behavior is expected to
raise.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@divref/divref/tools/rewrite_fasta.py`:
- Line 24: Change the bare index access on the line sentinel to guard against
empty lines: replace the condition using line[0] (in the loop that processes
FASTA lines) with a check that the line is non-empty before inspecting the first
character (e.g., use line and line[0] == ">" or use line.startswith(">")).
Update the FASTA-reading loop in rewrite_fasta.py where the variable line is
tested so blank lines are skipped safely and no IndexError can occur.

---

Nitpick comments:
In `@divref/tests/conftest.py`:
- Around line 24-26: Add a brief docstring to the datadir pytest fixture that
describes its purpose and return value: explain that datadir returns a
pathlib.Path pointing to the tests "data" directory adjacent to conftest.py so
tests can load test data files; place the docstring immediately below the def
datadir(...) signature.

In `@divref/tests/tools/test_rewrite_fasta.py`:
- Around line 1-53: Add an error-case test that asserts rewrite_fasta raises
FileNotFoundError when the input FASTA is missing: create a new test function
(e.g., test_raises_on_missing_input) that calls
rewrite_fasta(fasta_path=tmp_path / "nonexistent.fa", output_path=tmp_path /
"out.fa") inside a pytest.raises(FileNotFoundError) context using the existing
tmp_path fixture; reference the rewrite_fasta function to locate where behavior
is expected to raise.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d5886734-3cee-4947-aa92-b18a1059d690

📥 Commits

Reviewing files that changed from the base of the PR and between 403ece8 and df39e5e.

📒 Files selected for processing (8)

.gitignore
divref/divref/haplotype.py
divref/divref/tools/rewrite_fasta.py
divref/tests/conftest.py
divref/tests/data/test.fa
divref/tests/data/test.fa.fai
divref/tests/test_haplotype.py
divref/tests/tools/test_rewrite_fasta.py

coderabbitai · 2026-04-14T17:49:49Z

divref/divref/tools/rewrite_fasta.py

+    keep = False
+    with open(fasta_path) as f, open(output_path, "w") as out:
+        for line in tqdm.tqdm(f):
+            if line[0] == ">":


⚠️ Potential issue | 🟡 Minor

Potential IndexError on empty lines in the FASTA file.

If the input FASTA contains blank lines (e.g., between contigs or trailing newlines), line[0] will raise an IndexError. While well-formed FASTA files typically don't have empty lines, defensive handling would improve robustness.

🛡️ Proposed fix

for line in tqdm.tqdm(f): + if not line.strip(): + continue if line[0] == ">":

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@divref/divref/tools/rewrite_fasta.py` at line 24, Change the bare index access on the line sentinel to guard against empty lines: replace the condition using line[0] (in the loop that processes FASTA lines) with a check that the line is non-empty before inspecting the first character (e.g., use line and line[0] == ">" or use line.startswith(">")). Update the FASTA-reading loop in rewrite_fasta.py where the variable line is tested so blank lines are skipped safely and no IndexError can occur.

Port rewrite_fasta.py from human-diversity-reference/scripts as a defopt-compatible toolkit tool. Filters a FASTA file to keep only canonical chromosomes (chr1-22, X, Y, MT). Fixes a potential UnboundLocalError by initialising keep=False before the loop and removes the unused header_line variable from the original. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds tests/tools/test_rewrite_fasta.py covering rewrite_fasta() using pytest's tmp_path fixture: - Canonical autosomes (chr1, chr22) are kept - Sex chromosomes (chrX, chrY) and chrMT are kept - Alt contigs (chr1_alt) and decoys (chrUn_gl000220, chrEBV) are filtered - Mixed input with canonical and non-canonical contigs interleaved - Empty input produces empty output - Multi-line sequences are written in full Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ameynert had a problem deploying to github-actions-snakemake-linting April 10, 2026 22:58 — with GitHub Actions Failure

ameynert force-pushed the am_03_haplotype_utils branch from f98f6db to 5245c8e Compare April 10, 2026 23:16

ameynert force-pushed the am_04_rewrite_fasta branch from 3c51270 to 88ee1de Compare April 10, 2026 23:17

ameynert temporarily deployed to github-actions-snakemake-linting April 10, 2026 23:17 — with GitHub Actions Inactive

ameynert force-pushed the am_03_haplotype_utils branch from 5245c8e to 06e1fab Compare April 13, 2026 18:18

Base automatically changed from am_03_haplotype_utils to main April 13, 2026 23:02

ameynert force-pushed the am_04_rewrite_fasta branch from 88ee1de to df39e5e Compare April 14, 2026 17:40

ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 17:40 — with GitHub Actions Inactive

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 18:14 — with GitHub Actions Inactive

ameynert force-pushed the am_04_rewrite_fasta branch from 24a7f04 to a2f8fa4 Compare April 14, 2026 18:14

ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 18:14 — with GitHub Actions Inactive

ameynert and others added 3 commits April 14, 2026 13:47

chore: update .gitignore

d07c2c6

ameynert force-pushed the am_04_rewrite_fasta branch from a2f8fa4 to d07c2c6 Compare April 14, 2026 20:47

ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 20:47 — with GitHub Actions Inactive

ameynert merged commit d33f60f into main Apr 14, 2026
4 checks passed

ameynert deleted the am_04_rewrite_fasta branch April 14, 2026 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add rewrite_fasta tool#5

feat: add rewrite_fasta tool#5
ameynert merged 3 commits intomainfrom
am_04_rewrite_fasta

ameynert commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ameynert commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ameynert commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading