Skip to content

feat: add rewrite_fasta tool#5

Merged
ameynert merged 3 commits intomainfrom
am_04_rewrite_fasta
Apr 14, 2026
Merged

feat: add rewrite_fasta tool#5
ameynert merged 3 commits intomainfrom
am_04_rewrite_fasta

Conversation

@ameynert
Copy link
Copy Markdown
Collaborator

@ameynert ameynert commented Apr 10, 2026

Summary

  • Ports rewrite_fasta.py from human-diversity-reference/scripts as a defopt-compatible toolkit tool
  • Filters a FASTA file to retain only canonical chromosomes (chr1–22, X, Y, MT)
  • Fixes a potential UnboundLocalError in the original by initialising keep=False before the loop
  • Removes the unused header_line variable from the original

Test plan

  • uv run --directory divref poe check-all passes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added FASTA filtering tool to retain only canonical chromosome contigs (chr1–chr22, chrX, chrY, chrMT) while removing non-canonical sequences.
  • Tests

    • Added comprehensive test coverage for FASTA filtering functionality.

@ameynert ameynert had a problem deploying to github-actions-snakemake-linting April 10, 2026 22:58 — with GitHub Actions Failure
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f97523b4-6a4c-4a4a-8c88-069f4d0b9731

📥 Commits

Reviewing files that changed from the base of the PR and between df39e5e and d07c2c6.

📒 Files selected for processing (3)
  • .gitignore
  • divref/divref/tools/rewrite_fasta.py
  • divref/tests/tools/test_rewrite_fasta.py
✅ Files skipped from review due to trivial changes (3)
  • .gitignore
  • divref/divref/tools/rewrite_fasta.py
  • divref/tests/tools/test_rewrite_fasta.py

📝 Walkthrough

Walkthrough

Updated .gitignore to narrow the ignore pattern from **/claude to **/.claude. Added a new rewrite_fasta() tool that filters FASTA files to retain only canonical chromosome contigs (chr1–chr22, chrX, chrY, chrMT), with comprehensive test coverage.

Changes

Cohort / File(s) Summary
Configuration
.gitignore
Narrowed ignore pattern to target .claude (dot-prefixed) instead of claude entries.
FASTA Filtering Tool
divref/divref/tools/rewrite_fasta.py
New module implementing rewrite_fasta() function that streams input FASTA line-by-line, builds canonical contig set, and writes only headers and sequences matching canonical chromosome names to output.
Tool Tests
divref/tests/tools/test_rewrite_fasta.py
Comprehensive test suite validating FASTA filtering behavior: canonical contig retention, non-canonical contig filtering, mixed input handling, empty input, and multiline sequence preservation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 With whiskers twitching, I filtered the DNA,
Keeping chromosomes canonical, tossing the gray,
Twenty-three keepers (plus X, Y, and MT divine),
Tests passed with a hop—the FASTA now shines! 🧬✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add rewrite_fasta tool' accurately describes the main change: introducing a new rewrite_fasta tool module with accompanying tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch am_04_rewrite_fasta

Comment @coderabbitai help to get the list of available commands and usage tips.

@ameynert ameynert force-pushed the am_03_haplotype_utils branch from f98f6db to 5245c8e Compare April 10, 2026 23:16
@ameynert ameynert force-pushed the am_04_rewrite_fasta branch from 3c51270 to 88ee1de Compare April 10, 2026 23:17
@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 10, 2026 23:17 — with GitHub Actions Inactive
@ameynert ameynert force-pushed the am_03_haplotype_utils branch from 5245c8e to 06e1fab Compare April 13, 2026 18:18
Base automatically changed from am_03_haplotype_utils to main April 13, 2026 23:02
@ameynert ameynert force-pushed the am_04_rewrite_fasta branch from 88ee1de to df39e5e Compare April 14, 2026 17:40
@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 17:40 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
divref/tests/conftest.py (1)

24-26: Consider adding a brief docstring for the datadir fixture.

Per coding guidelines, public functions/fixtures benefit from docstrings explaining their purpose.

📝 Proposed docstring
 `@pytest.fixture`
 def datadir() -> Path:
+    """Return the path to the test data directory."""
     return Path(__file__).parent / "data"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/conftest.py` around lines 24 - 26, Add a brief docstring to the
datadir pytest fixture that describes its purpose and return value: explain that
datadir returns a pathlib.Path pointing to the tests "data" directory adjacent
to conftest.py so tests can load test data files; place the docstring
immediately below the def datadir(...) signature.
divref/tests/tools/test_rewrite_fasta.py (1)

1-53: Consider adding an error-case test for missing input file.

Per coding guidelines, new public functions require "at least one happy-path test + one error case." The happy-path coverage is excellent, but there's no test verifying behavior when the input file doesn't exist (expected: FileNotFoundError).

🧪 Suggested error case test
def test_raises_on_missing_input(tmp_path: Path) -> None:
    with pytest.raises(FileNotFoundError):
        rewrite_fasta(fasta_path=tmp_path / "nonexistent.fa", output_path=tmp_path / "out.fa")

As per coding guidelines: "New public functions require at least one happy-path test + one error case."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/tools/test_rewrite_fasta.py` around lines 1 - 53, Add an
error-case test that asserts rewrite_fasta raises FileNotFoundError when the
input FASTA is missing: create a new test function (e.g.,
test_raises_on_missing_input) that calls rewrite_fasta(fasta_path=tmp_path /
"nonexistent.fa", output_path=tmp_path / "out.fa") inside a
pytest.raises(FileNotFoundError) context using the existing tmp_path fixture;
reference the rewrite_fasta function to locate where behavior is expected to
raise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@divref/divref/tools/rewrite_fasta.py`:
- Line 24: Change the bare index access on the line sentinel to guard against
empty lines: replace the condition using line[0] (in the loop that processes
FASTA lines) with a check that the line is non-empty before inspecting the first
character (e.g., use line and line[0] == ">" or use line.startswith(">")).
Update the FASTA-reading loop in rewrite_fasta.py where the variable line is
tested so blank lines are skipped safely and no IndexError can occur.

---

Nitpick comments:
In `@divref/tests/conftest.py`:
- Around line 24-26: Add a brief docstring to the datadir pytest fixture that
describes its purpose and return value: explain that datadir returns a
pathlib.Path pointing to the tests "data" directory adjacent to conftest.py so
tests can load test data files; place the docstring immediately below the def
datadir(...) signature.

In `@divref/tests/tools/test_rewrite_fasta.py`:
- Around line 1-53: Add an error-case test that asserts rewrite_fasta raises
FileNotFoundError when the input FASTA is missing: create a new test function
(e.g., test_raises_on_missing_input) that calls
rewrite_fasta(fasta_path=tmp_path / "nonexistent.fa", output_path=tmp_path /
"out.fa") inside a pytest.raises(FileNotFoundError) context using the existing
tmp_path fixture; reference the rewrite_fasta function to locate where behavior
is expected to raise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d5886734-3cee-4947-aa92-b18a1059d690

📥 Commits

Reviewing files that changed from the base of the PR and between 403ece8 and df39e5e.

📒 Files selected for processing (8)
  • .gitignore
  • divref/divref/haplotype.py
  • divref/divref/tools/rewrite_fasta.py
  • divref/tests/conftest.py
  • divref/tests/data/test.fa
  • divref/tests/data/test.fa.fai
  • divref/tests/test_haplotype.py
  • divref/tests/tools/test_rewrite_fasta.py

keep = False
with open(fasta_path) as f, open(output_path, "w") as out:
for line in tqdm.tqdm(f):
if line[0] == ">":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential IndexError on empty lines in the FASTA file.

If the input FASTA contains blank lines (e.g., between contigs or trailing newlines), line[0] will raise an IndexError. While well-formed FASTA files typically don't have empty lines, defensive handling would improve robustness.

🛡️ Proposed fix
         for line in tqdm.tqdm(f):
+            if not line.strip():
+                continue
             if line[0] == ">":
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/divref/tools/rewrite_fasta.py` at line 24, Change the bare index
access on the line sentinel to guard against empty lines: replace the condition
using line[0] (in the loop that processes FASTA lines) with a check that the
line is non-empty before inspecting the first character (e.g., use line and
line[0] == ">" or use line.startswith(">")). Update the FASTA-reading loop in
rewrite_fasta.py where the variable line is tested so blank lines are skipped
safely and no IndexError can occur.

@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 18:14 — with GitHub Actions Inactive
@ameynert ameynert force-pushed the am_04_rewrite_fasta branch from 24a7f04 to a2f8fa4 Compare April 14, 2026 18:14
@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 18:14 — with GitHub Actions Inactive
ameynert and others added 3 commits April 14, 2026 13:47
Port rewrite_fasta.py from human-diversity-reference/scripts as a
defopt-compatible toolkit tool. Filters a FASTA file to keep only
canonical chromosomes (chr1-22, X, Y, MT). Fixes a potential
UnboundLocalError by initialising keep=False before the loop and
removes the unused header_line variable from the original.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds tests/tools/test_rewrite_fasta.py covering rewrite_fasta() using
pytest's tmp_path fixture:
- Canonical autosomes (chr1, chr22) are kept
- Sex chromosomes (chrX, chrY) and chrMT are kept
- Alt contigs (chr1_alt) and decoys (chrUn_gl000220, chrEBV) are filtered
- Mixed input with canonical and non-canonical contigs interleaved
- Empty input produces empty output
- Multi-line sequences are written in full

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ameynert ameynert force-pushed the am_04_rewrite_fasta branch from a2f8fa4 to d07c2c6 Compare April 14, 2026 20:47
@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 20:47 — with GitHub Actions Inactive
@ameynert ameynert merged commit d33f60f into main Apr 14, 2026
4 checks passed
@ameynert ameynert deleted the am_04_rewrite_fasta branch April 14, 2026 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant