Skip to content

feat: get basic tests working#18

Merged
ameynert merged 5 commits intomainfrom
am_03a_haplotype_utils_tests
Apr 14, 2026
Merged

feat: get basic tests working#18
ameynert merged 5 commits intomainfrom
am_03a_haplotype_utils_tests

Conversation

@ameynert
Copy link
Copy Markdown
Collaborator

@ameynert ameynert commented Apr 13, 2026

Summary by CodeRabbit

  • New Features

    • Sequence operations now accept a selectable reference genome (default: GRCh38).
  • Bug Fixes

    • Simplified and improved haplotype breakpoint detection for more accurate splitting.
    • Requesting an unregistered reference genome now raises a clear error.
  • Tests

    • Added unit tests, a small FASTA/FAI test dataset, and fixtures to support reference-genome-aware testing.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

The PR adds a configurable reference_genome parameter to get_haplo_sequence, narrows split_haplotypes typing and simplifies its breakpoint logic, and introduces pytest fixtures plus FASTA/FAI test data and tests that validate sequence retrieval against a registered reference genome.

Changes

Cohort / File(s) Summary
Core Functionality
divref/divref/haplotype.py
Added reference_genome: str = "GRCh38" parameter to get_haplo_sequence and passed it to hl.get_sequence; tightened split_haplotypes signature to hl.Table and simplified the breakpoint filter by removing the `(i == 0)
Test Fixtures
divref/tests/conftest.py
Added datadir() fixture returning tests/data/ path and hail_reference_genome() fixture that constructs a minimal in-memory hl.ReferenceGenome with chr1 length 1000.
Test Data
divref/tests/data/test.fa, divref/tests/data/test.fa.fai
Added FASTA (test.fa) containing chr1 sequence and its FAI index (test.fa.fai) for test reference registration.
Tests
divref/tests/test_haplotype.py
Refactored test helpers to include contig in variant loci, switched to import hail as hl, added test_get_haplo_sequence_single and test_get_haplo_sequence_invalid_reference_genome_raises, and updated split_haplotypes tests to use contig-qualified variants.

Sequence Diagram(s)

(omitted — changes do not introduce multi-component sequential control flow requiring visualization)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🐰 I hopped through code with careful paws,

I added genomes and cleared the clause.
chr1 hums soft in FASTA light,
Tests confirm the sequence bright,
A rabbit cheers — the build takes flight. 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'feat: get basic tests working' is vague and generic, using non-descriptive language that doesn't convey specific information about the changeset. Replace with a more specific title that reflects the main change, such as 'feat: add haplotype tests with reference genome support' or 'feat: implement test infrastructure for haplotype utilities'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch am_03a_haplotype_utils_tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 17:00 — with GitHub Actions Inactive
@ameynert ameynert marked this pull request as ready for review April 14, 2026 17:00
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
divref/tests/conftest.py (2)

30-38: Add hail_context dependency and improve type annotation.

Two concerns:

  1. Missing fixture dependency: This fixture creates a Hail object but doesn't depend on hail_context. If a test requests only hail_reference_genome without hail_context, Hail won't be initialized, causing a runtime error.

  2. Any return type: The coding guidelines recommend avoiding Any. Consider using hl.ReferenceGenome directly if Hail exports the type, or a protocol/type alias.

🔧 Proposed fix
 `@pytest.fixture`
-def hail_reference_genome() -> Any:
+def hail_reference_genome(hail_context: None) -> Any:  # noqa: ARG001
     """A small custom reference genome for use in testing."""
     contigs: list[str] = ["chr1"]
     lengths: dict[str, int] = {"chr1": 1000}
 
     reference_genome = hl.ReferenceGenome(
         "test_chr1", contigs, lengths, x_contigs=[], y_contigs=[], mt_contigs=[]
     )
     return reference_genome
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/conftest.py` around lines 30 - 38, The fixture
hail_reference_genome must depend on the hail_context fixture and should return
a concrete Hail type instead of Any; update the fixture signature to accept
hail_context (to ensure Hail is initialized) and change the return annotation to
hl.ReferenceGenome (or an appropriate type alias/protocol if needed), then
return the created reference_genome instance from the function; reference the
hail_reference_genome fixture and hail_context dependency so tests that request
only hail_reference_genome will have Hail initialized.

25-27: Add docstring to datadir fixture.

As per coding guidelines, docstrings are required on all public functions/classes.

📝 Proposed fix
 `@pytest.fixture`
 def datadir() -> Path:
+    """Return the path to the test data directory."""
     return Path(__file__).parent / "data"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/conftest.py` around lines 25 - 27, Add a docstring to the public
pytest fixture function datadir in conftest.py: immediately under the def
datadir() -> Path: line add a concise triple-quoted docstring that states this
fixture returns the test data directory as a pathlib.Path (the "data"
subdirectory next to conftest.py) and any usage notes (e.g., relative to tests).
Ensure the docstring is a normal Python docstring ("""...""") and keep it brief
and descriptive so linters recognize the fixture as documented.
divref/tests/data/test.fa (1)

1-1: Non-standard FASTA header format.

The header > chr1 includes a space between > and the sequence name, which is unconventional (standard is >chr1). While this works, it may cause issues with some FASTA parsers. Consider removing the space for better compatibility with external tools.

💡 Suggested fix
-> chr1
+>chr1

If changed, update test.fa.fai offset from 7 to 6.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/data/test.fa` at line 1, Change the FASTA header from the
non-standard "> chr1" to ">chr1" to remove the space and improve compatibility
with parsers; after updating the header string adjust the corresponding index in
the FASTA index file by changing the offset value referenced as "7" to "6" so
the test.fa.fai remains consistent with the modified header.
divref/tests/test_haplotype.py (1)

23-42: Schema mismatch: variant_type is missing contig field.

The variant_type schema at line 24 defines locus with only position, but the dict literals at line 31 also only include position. This works for split_haplotypes tests (which only use position for distance calculations), but creates an inconsistency with _make_variant which includes contig.

Consider aligning the schema if split_haplotypes will ever need the contig:

💡 Optional: Add contig to schema for consistency
-    variant_type = hl.tstruct(locus=hl.tstruct(position=hl.tint32), alleles=hl.tarray(hl.tstr))
+    variant_type = hl.tstruct(locus=hl.tstruct(contig=hl.tstr, position=hl.tint32), alleles=hl.tarray(hl.tstr))
     ...
     variants = [
-        {"locus": {"position": pos}, "alleles": [ref, alt]} for pos, ref, alt in variant_positions
+        {"locus": {"contig": "chr1", "position": pos}, "alleles": [ref, alt]} for pos, ref, alt in variant_positions
     ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/test_haplotype.py` around lines 23 - 42, The schema for
variant_type in _make_haplotype_table is missing the locus.contig field; update
variant_type (and the row_type if needed) to include locus.contig and also
update the constructed variant dicts in _make_haplotype_table to provide a
contig value (e.g., a default like "1" or reuse the same contig semantics as
_make_variant) so the table schema matches _make_variant and any consumers like
split_haplotypes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@divref/tests/conftest.py`:
- Around line 30-38: The fixture hail_reference_genome must depend on the
hail_context fixture and should return a concrete Hail type instead of Any;
update the fixture signature to accept hail_context (to ensure Hail is
initialized) and change the return annotation to hl.ReferenceGenome (or an
appropriate type alias/protocol if needed), then return the created
reference_genome instance from the function; reference the hail_reference_genome
fixture and hail_context dependency so tests that request only
hail_reference_genome will have Hail initialized.
- Around line 25-27: Add a docstring to the public pytest fixture function
datadir in conftest.py: immediately under the def datadir() -> Path: line add a
concise triple-quoted docstring that states this fixture returns the test data
directory as a pathlib.Path (the "data" subdirectory next to conftest.py) and
any usage notes (e.g., relative to tests). Ensure the docstring is a normal
Python docstring ("""...""") and keep it brief and descriptive so linters
recognize the fixture as documented.

In `@divref/tests/data/test.fa`:
- Line 1: Change the FASTA header from the non-standard "> chr1" to ">chr1" to
remove the space and improve compatibility with parsers; after updating the
header string adjust the corresponding index in the FASTA index file by changing
the offset value referenced as "7" to "6" so the test.fa.fai remains consistent
with the modified header.

In `@divref/tests/test_haplotype.py`:
- Around line 23-42: The schema for variant_type in _make_haplotype_table is
missing the locus.contig field; update variant_type (and the row_type if needed)
to include locus.contig and also update the constructed variant dicts in
_make_haplotype_table to provide a contig value (e.g., a default like "1" or
reuse the same contig semantics as _make_variant) so the table schema matches
_make_variant and any consumers like split_haplotypes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a84962be-2864-41be-8dd1-eca320113fde

📥 Commits

Reviewing files that changed from the base of the PR and between 403ece8 and b001566.

📒 Files selected for processing (5)
  • divref/divref/haplotype.py
  • divref/tests/conftest.py
  • divref/tests/data/test.fa
  • divref/tests/data/test.fa.fai
  • divref/tests/test_haplotype.py

@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 17:39 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
divref/tests/conftest.py (1)

25-31: Add Google-style fixture docstrings for clarity and guideline compliance.

datadir currently has no docstring, and hail_reference_genome uses a minimal one-liner. Please document fixture behavior/contracts with Google-style sections (Returns: and Args: where applicable).

As per coding guidelines, "Docstrings are required on all public functions/classes" and "Use Google-style docstrings with Args:, Returns:, Yields:, and Raises: blocks".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/conftest.py` around lines 25 - 31, Add Google-style docstrings
to the datadir and hail_reference_genome fixtures: for datadir, add a docstring
describing it returns a Path pointing to the test data directory and include a
Returns: section specifying a pathlib.Path; for hail_reference_genome, expand
the one-line summary into a Google-style docstring that documents Args:
(hail_context: HL context fixture), Returns: (hl.ReferenceGenome), and any
relevant behavior/contract (e.g., that it provides a small custom reference
genome for tests). Update the docstrings on the functions named datadir and
hail_reference_genome accordingly.
divref/tests/test_haplotype.py (1)

19-23: Replace Any annotations with concrete types in test helpers/fixtures.

Using Any here weakens static guarantees. Prefer concrete Hail types (or a local type alias) for _make_variant, _make_haplotype_table, and hail_reference_genome in the test signature.

Proposed typing cleanup
-from typing import Any
-
-def _make_variant(position: int, ref: str, alt: str) -> Any:
+def _make_variant(position: int, ref: str, alt: str) -> hl.Struct:
     return hl.Struct(locus=hl.Struct(contig="chr1", position=position), alleles=[ref, alt])

-def _make_haplotype_table(variant_positions: list[tuple[str, int, str, str]]) -> Any:
+def _make_haplotype_table(variant_positions: list[tuple[str, int, str, str]]) -> hl.Table:
@@
 def test_get_haplo_sequence_single(
     datadir: Path,
-    hail_reference_genome: Any,
+    hail_reference_genome: hl.ReferenceGenome,
     hail_context: None,  # noqa: ARG001
 ) -> None:

As per coding guidelines, "Avoid Any in Python type annotations—prefer type alias or TypeVar".

Also applies to: 53-56

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/test_haplotype.py` around lines 19 - 23, Tests use `Any` for
helper signatures which weakens type checking; replace `Any` in _make_variant,
_make_haplotype_table (and the hail_reference_genome test param) with concrete
Hail types or a local type alias (e.g., aliasing hail.utils.StructExpression or
hail.Table for the expected returns/params) so static analysis can validate
usage; update the function signatures for _make_variant to return the specific
Hail struct type for a locus/alleles, and for _make_haplotype_table to
accept/return a typed hail Table or Sequence of typed tuples, creating a small
local type alias at the top of the test file if needed to keep signatures
readable (refer to symbols _make_variant, _make_haplotype_table,
hail_reference_genome).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@divref/tests/test_haplotype.py`:
- Around line 53-69: Add a negative test that exercises the error path for
get_haplo_sequence when an unregistered/invalid reference genome name is passed:
call get_haplo_sequence with reference_genome set to a string that was not added
via hail_reference_genome.add_sequence (or remove the registration step), then
assert it raises the expected exception (e.g., ValueError or HailError) or
returns the documented error; reference the existing test flow around
get_haplo_sequence, hail_reference_genome.add_sequence, and _make_variant to
mirror setup but trigger the invalid reference path and assert the specific
error type/message.

---

Nitpick comments:
In `@divref/tests/conftest.py`:
- Around line 25-31: Add Google-style docstrings to the datadir and
hail_reference_genome fixtures: for datadir, add a docstring describing it
returns a Path pointing to the test data directory and include a Returns:
section specifying a pathlib.Path; for hail_reference_genome, expand the
one-line summary into a Google-style docstring that documents Args:
(hail_context: HL context fixture), Returns: (hl.ReferenceGenome), and any
relevant behavior/contract (e.g., that it provides a small custom reference
genome for tests). Update the docstrings on the functions named datadir and
hail_reference_genome accordingly.

In `@divref/tests/test_haplotype.py`:
- Around line 19-23: Tests use `Any` for helper signatures which weakens type
checking; replace `Any` in _make_variant, _make_haplotype_table (and the
hail_reference_genome test param) with concrete Hail types or a local type alias
(e.g., aliasing hail.utils.StructExpression or hail.Table for the expected
returns/params) so static analysis can validate usage; update the function
signatures for _make_variant to return the specific Hail struct type for a
locus/alleles, and for _make_haplotype_table to accept/return a typed hail Table
or Sequence of typed tuples, creating a small local type alias at the top of the
test file if needed to keep signatures readable (refer to symbols _make_variant,
_make_haplotype_table, hail_reference_genome).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9c71e56-8b97-47ff-91a8-81a9653973d8

📥 Commits

Reviewing files that changed from the base of the PR and between b001566 and abfdc2e.

📒 Files selected for processing (4)
  • divref/tests/conftest.py
  • divref/tests/data/test.fa
  • divref/tests/data/test.fa.fai
  • divref/tests/test_haplotype.py
✅ Files skipped from review due to trivial changes (2)
  • divref/tests/data/test.fa
  • divref/tests/data/test.fa.fai

@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 18:14 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
divref/tests/test_haplotype.py (1)

151-179: ⚠️ Potential issue | 🟡 Minor

Add the == window_size regression case.

These tests cover gaps smaller than and larger than window_size, but the changed split condition is the inclusive >= window_size boundary. One exact-threshold case would lock down the bug fix much better.

As per coding guidelines, "bug fixes should include regression test".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/test_haplotype.py` around lines 151 - 179, Add a regression test
that covers the boundary case where the inter-variant gap equals window_size:
create a haplotype table via _make_haplotype_table with variants whose positions
have a gap exactly window_size apart, call split_haplotypes(...,
window_size=XYZ) and assert the behavior matches the inclusive change (i.e.,
segments should split when gap >= window_size). Place this new test alongside
test_split_haplotypes_splits_at_large_gap and
test_split_haplotypes_discards_singleton_segment, using the same patterns
(collect(), checking len(rows) and the variant positions) to lock in the >=
window_size behavior.
divref/divref/haplotype.py (1)

41-47: ⚠️ Potential issue | 🟡 Minor

Document the invalid-reference failure in Raises:.

The new reference_genome parameter adds a public error path, and divref/tests/test_haplotype.py Lines 70-76 now lock KeyError in as observable behavior. The docstring should reflect that, or the function should normalize the exception before exposing it.

As per coding guidelines, "Update docstrings when function signature or behavior changes".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/divref/haplotype.py` around lines 41 - 47, The haplotype function's
public parameter reference_genome can now surface a KeyError when the reference
lookup fails; update the haplotype docstring Raises section to list KeyError
(e.g., "KeyError: If the provided reference_genome is not found/invalid") so
callers are informed, referencing the reference_genome parameter and the
haplotype function, or alternatively normalize the error inside haplotype by
catching KeyError from the reference lookup and re-raising a ValueError with a
clear message about an invalid reference_genome.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@divref/tests/test_haplotype.py`:
- Around line 75-76: The test's call to get_haplo_sequence inside the
pytest.raises block is exceeding line-length (Ruff E501); reformat the
pytest.raises block by breaking the get_haplo_sequence invocation across
multiple lines so each argument is on its own line and the call fits within
line-length limits (update the with pytest.raises(KeyError,
match="nonexistent_genome") block to contain a multi-line call to
get_haplo_sequence with context_size=2, variants=[variant], and
reference_genome="nonexistent_genome"). Ensure indentation stays consistent and
the pytest.raises context encloses the wrapped call.

---

Outside diff comments:
In `@divref/divref/haplotype.py`:
- Around line 41-47: The haplotype function's public parameter reference_genome
can now surface a KeyError when the reference lookup fails; update the haplotype
docstring Raises section to list KeyError (e.g., "KeyError: If the provided
reference_genome is not found/invalid") so callers are informed, referencing the
reference_genome parameter and the haplotype function, or alternatively
normalize the error inside haplotype by catching KeyError from the reference
lookup and re-raising a ValueError with a clear message about an invalid
reference_genome.

In `@divref/tests/test_haplotype.py`:
- Around line 151-179: Add a regression test that covers the boundary case where
the inter-variant gap equals window_size: create a haplotype table via
_make_haplotype_table with variants whose positions have a gap exactly
window_size apart, call split_haplotypes(..., window_size=XYZ) and assert the
behavior matches the inclusive change (i.e., segments should split when gap >=
window_size). Place this new test alongside
test_split_haplotypes_splits_at_large_gap and
test_split_haplotypes_discards_singleton_segment, using the same patterns
(collect(), checking len(rows) and the variant positions) to lock in the >=
window_size behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3fbb19ab-96cc-495e-90b8-7c3feaf77f6f

📥 Commits

Reviewing files that changed from the base of the PR and between abfdc2e and 6a00d88.

📒 Files selected for processing (3)
  • divref/divref/haplotype.py
  • divref/tests/conftest.py
  • divref/tests/test_haplotype.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • divref/tests/conftest.py

@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 14, 2026 18:36 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
divref/tests/test_haplotype.py (1)

22-22: Broaden helper input type on Line 22.

_make_haplotype_table can accept a more general collection type than list[...], which makes the helper easier to reuse without changing behavior.

Suggested refactor
-def _make_haplotype_table(variant_positions: list[tuple[str, int, str, str]]) -> hl.Table:
+from collections.abc import Iterable
+
+def _make_haplotype_table(
+    variant_positions: Iterable[tuple[str, int, str, str]]
+) -> hl.Table:

As per coding guidelines, "Accept the most general type practical for parameters (e.g., Iterable over List)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/tests/test_haplotype.py` at line 22, Change _make_haplotype_table to
accept a more general collection type instead of list: update the parameter type
from list[tuple[str, int, str, str]] to Iterable[tuple[str, int, str, str]] (or
Sequence[...] if you need indexing) in the function signature for
_make_haplotype_table; import the appropriate typing name (Iterable or Sequence)
and ensure any code inside _make_haplotype_table does not rely on list-specific
methods (if it does, convert to a list locally with list(...)) so behavior
remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@divref/tests/test_haplotype.py`:
- Line 22: Change _make_haplotype_table to accept a more general collection type
instead of list: update the parameter type from list[tuple[str, int, str, str]]
to Iterable[tuple[str, int, str, str]] (or Sequence[...] if you need indexing)
in the function signature for _make_haplotype_table; import the appropriate
typing name (Iterable or Sequence) and ensure any code inside
_make_haplotype_table does not rely on list-specific methods (if it does,
convert to a list locally with list(...)) so behavior remains unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: deb78240-ed70-4ee2-b177-b6fd3050b384

📥 Commits

Reviewing files that changed from the base of the PR and between 6a00d88 and bd6131e.

📒 Files selected for processing (1)
  • divref/tests/test_haplotype.py

@ameynert ameynert merged commit 0034a61 into main Apr 14, 2026
4 checks passed
@ameynert ameynert deleted the am_03a_haplotype_utils_tests branch April 14, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant