Conversation
📝 WalkthroughWalkthroughThe PR adds a configurable Changes
Sequence Diagram(s)(omitted — changes do not introduce multi-component sequential control flow requiring visualization) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
divref/tests/conftest.py (2)
30-38: Addhail_contextdependency and improve type annotation.Two concerns:
Missing fixture dependency: This fixture creates a Hail object but doesn't depend on
hail_context. If a test requests onlyhail_reference_genomewithouthail_context, Hail won't be initialized, causing a runtime error.
Anyreturn type: The coding guidelines recommend avoidingAny. Consider usinghl.ReferenceGenomedirectly if Hail exports the type, or a protocol/type alias.🔧 Proposed fix
`@pytest.fixture` -def hail_reference_genome() -> Any: +def hail_reference_genome(hail_context: None) -> Any: # noqa: ARG001 """A small custom reference genome for use in testing.""" contigs: list[str] = ["chr1"] lengths: dict[str, int] = {"chr1": 1000} reference_genome = hl.ReferenceGenome( "test_chr1", contigs, lengths, x_contigs=[], y_contigs=[], mt_contigs=[] ) return reference_genome🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@divref/tests/conftest.py` around lines 30 - 38, The fixture hail_reference_genome must depend on the hail_context fixture and should return a concrete Hail type instead of Any; update the fixture signature to accept hail_context (to ensure Hail is initialized) and change the return annotation to hl.ReferenceGenome (or an appropriate type alias/protocol if needed), then return the created reference_genome instance from the function; reference the hail_reference_genome fixture and hail_context dependency so tests that request only hail_reference_genome will have Hail initialized.
25-27: Add docstring todatadirfixture.As per coding guidelines, docstrings are required on all public functions/classes.
📝 Proposed fix
`@pytest.fixture` def datadir() -> Path: + """Return the path to the test data directory.""" return Path(__file__).parent / "data"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@divref/tests/conftest.py` around lines 25 - 27, Add a docstring to the public pytest fixture function datadir in conftest.py: immediately under the def datadir() -> Path: line add a concise triple-quoted docstring that states this fixture returns the test data directory as a pathlib.Path (the "data" subdirectory next to conftest.py) and any usage notes (e.g., relative to tests). Ensure the docstring is a normal Python docstring ("""...""") and keep it brief and descriptive so linters recognize the fixture as documented.divref/tests/data/test.fa (1)
1-1: Non-standard FASTA header format.The header
> chr1includes a space between>and the sequence name, which is unconventional (standard is>chr1). While this works, it may cause issues with some FASTA parsers. Consider removing the space for better compatibility with external tools.💡 Suggested fix
-> chr1 +>chr1If changed, update
test.fa.faioffset from7to6.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@divref/tests/data/test.fa` at line 1, Change the FASTA header from the non-standard "> chr1" to ">chr1" to remove the space and improve compatibility with parsers; after updating the header string adjust the corresponding index in the FASTA index file by changing the offset value referenced as "7" to "6" so the test.fa.fai remains consistent with the modified header.divref/tests/test_haplotype.py (1)
23-42: Schema mismatch:variant_typeis missingcontigfield.The
variant_typeschema at line 24 defineslocuswith onlyposition, but the dict literals at line 31 also only includeposition. This works forsplit_haplotypestests (which only usepositionfor distance calculations), but creates an inconsistency with_make_variantwhich includescontig.Consider aligning the schema if
split_haplotypeswill ever need the contig:💡 Optional: Add contig to schema for consistency
- variant_type = hl.tstruct(locus=hl.tstruct(position=hl.tint32), alleles=hl.tarray(hl.tstr)) + variant_type = hl.tstruct(locus=hl.tstruct(contig=hl.tstr, position=hl.tint32), alleles=hl.tarray(hl.tstr)) ... variants = [ - {"locus": {"position": pos}, "alleles": [ref, alt]} for pos, ref, alt in variant_positions + {"locus": {"contig": "chr1", "position": pos}, "alleles": [ref, alt]} for pos, ref, alt in variant_positions ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@divref/tests/test_haplotype.py` around lines 23 - 42, The schema for variant_type in _make_haplotype_table is missing the locus.contig field; update variant_type (and the row_type if needed) to include locus.contig and also update the constructed variant dicts in _make_haplotype_table to provide a contig value (e.g., a default like "1" or reuse the same contig semantics as _make_variant) so the table schema matches _make_variant and any consumers like split_haplotypes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@divref/tests/conftest.py`:
- Around line 30-38: The fixture hail_reference_genome must depend on the
hail_context fixture and should return a concrete Hail type instead of Any;
update the fixture signature to accept hail_context (to ensure Hail is
initialized) and change the return annotation to hl.ReferenceGenome (or an
appropriate type alias/protocol if needed), then return the created
reference_genome instance from the function; reference the hail_reference_genome
fixture and hail_context dependency so tests that request only
hail_reference_genome will have Hail initialized.
- Around line 25-27: Add a docstring to the public pytest fixture function
datadir in conftest.py: immediately under the def datadir() -> Path: line add a
concise triple-quoted docstring that states this fixture returns the test data
directory as a pathlib.Path (the "data" subdirectory next to conftest.py) and
any usage notes (e.g., relative to tests). Ensure the docstring is a normal
Python docstring ("""...""") and keep it brief and descriptive so linters
recognize the fixture as documented.
In `@divref/tests/data/test.fa`:
- Line 1: Change the FASTA header from the non-standard "> chr1" to ">chr1" to
remove the space and improve compatibility with parsers; after updating the
header string adjust the corresponding index in the FASTA index file by changing
the offset value referenced as "7" to "6" so the test.fa.fai remains consistent
with the modified header.
In `@divref/tests/test_haplotype.py`:
- Around line 23-42: The schema for variant_type in _make_haplotype_table is
missing the locus.contig field; update variant_type (and the row_type if needed)
to include locus.contig and also update the constructed variant dicts in
_make_haplotype_table to provide a contig value (e.g., a default like "1" or
reuse the same contig semantics as _make_variant) so the table schema matches
_make_variant and any consumers like split_haplotypes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a84962be-2864-41be-8dd1-eca320113fde
📒 Files selected for processing (5)
divref/divref/haplotype.pydivref/tests/conftest.pydivref/tests/data/test.fadivref/tests/data/test.fa.faidivref/tests/test_haplotype.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
divref/tests/conftest.py (1)
25-31: Add Google-style fixture docstrings for clarity and guideline compliance.
datadircurrently has no docstring, andhail_reference_genomeuses a minimal one-liner. Please document fixture behavior/contracts with Google-style sections (Returns:andArgs:where applicable).As per coding guidelines, "Docstrings are required on all public functions/classes" and "Use Google-style docstrings with
Args:,Returns:,Yields:, andRaises:blocks".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@divref/tests/conftest.py` around lines 25 - 31, Add Google-style docstrings to the datadir and hail_reference_genome fixtures: for datadir, add a docstring describing it returns a Path pointing to the test data directory and include a Returns: section specifying a pathlib.Path; for hail_reference_genome, expand the one-line summary into a Google-style docstring that documents Args: (hail_context: HL context fixture), Returns: (hl.ReferenceGenome), and any relevant behavior/contract (e.g., that it provides a small custom reference genome for tests). Update the docstrings on the functions named datadir and hail_reference_genome accordingly.divref/tests/test_haplotype.py (1)
19-23: ReplaceAnyannotations with concrete types in test helpers/fixtures.Using
Anyhere weakens static guarantees. Prefer concrete Hail types (or a local type alias) for_make_variant,_make_haplotype_table, andhail_reference_genomein the test signature.Proposed typing cleanup
-from typing import Any - -def _make_variant(position: int, ref: str, alt: str) -> Any: +def _make_variant(position: int, ref: str, alt: str) -> hl.Struct: return hl.Struct(locus=hl.Struct(contig="chr1", position=position), alleles=[ref, alt]) -def _make_haplotype_table(variant_positions: list[tuple[str, int, str, str]]) -> Any: +def _make_haplotype_table(variant_positions: list[tuple[str, int, str, str]]) -> hl.Table: @@ def test_get_haplo_sequence_single( datadir: Path, - hail_reference_genome: Any, + hail_reference_genome: hl.ReferenceGenome, hail_context: None, # noqa: ARG001 ) -> None:As per coding guidelines, "Avoid
Anyin Python type annotations—prefer type alias orTypeVar".Also applies to: 53-56
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@divref/tests/test_haplotype.py` around lines 19 - 23, Tests use `Any` for helper signatures which weakens type checking; replace `Any` in _make_variant, _make_haplotype_table (and the hail_reference_genome test param) with concrete Hail types or a local type alias (e.g., aliasing hail.utils.StructExpression or hail.Table for the expected returns/params) so static analysis can validate usage; update the function signatures for _make_variant to return the specific Hail struct type for a locus/alleles, and for _make_haplotype_table to accept/return a typed hail Table or Sequence of typed tuples, creating a small local type alias at the top of the test file if needed to keep signatures readable (refer to symbols _make_variant, _make_haplotype_table, hail_reference_genome).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@divref/tests/test_haplotype.py`:
- Around line 53-69: Add a negative test that exercises the error path for
get_haplo_sequence when an unregistered/invalid reference genome name is passed:
call get_haplo_sequence with reference_genome set to a string that was not added
via hail_reference_genome.add_sequence (or remove the registration step), then
assert it raises the expected exception (e.g., ValueError or HailError) or
returns the documented error; reference the existing test flow around
get_haplo_sequence, hail_reference_genome.add_sequence, and _make_variant to
mirror setup but trigger the invalid reference path and assert the specific
error type/message.
---
Nitpick comments:
In `@divref/tests/conftest.py`:
- Around line 25-31: Add Google-style docstrings to the datadir and
hail_reference_genome fixtures: for datadir, add a docstring describing it
returns a Path pointing to the test data directory and include a Returns:
section specifying a pathlib.Path; for hail_reference_genome, expand the
one-line summary into a Google-style docstring that documents Args:
(hail_context: HL context fixture), Returns: (hl.ReferenceGenome), and any
relevant behavior/contract (e.g., that it provides a small custom reference
genome for tests). Update the docstrings on the functions named datadir and
hail_reference_genome accordingly.
In `@divref/tests/test_haplotype.py`:
- Around line 19-23: Tests use `Any` for helper signatures which weakens type
checking; replace `Any` in _make_variant, _make_haplotype_table (and the
hail_reference_genome test param) with concrete Hail types or a local type alias
(e.g., aliasing hail.utils.StructExpression or hail.Table for the expected
returns/params) so static analysis can validate usage; update the function
signatures for _make_variant to return the specific Hail struct type for a
locus/alleles, and for _make_haplotype_table to accept/return a typed hail Table
or Sequence of typed tuples, creating a small local type alias at the top of the
test file if needed to keep signatures readable (refer to symbols _make_variant,
_make_haplotype_table, hail_reference_genome).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e9c71e56-8b97-47ff-91a8-81a9653973d8
📒 Files selected for processing (4)
divref/tests/conftest.pydivref/tests/data/test.fadivref/tests/data/test.fa.faidivref/tests/test_haplotype.py
✅ Files skipped from review due to trivial changes (2)
- divref/tests/data/test.fa
- divref/tests/data/test.fa.fai
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
divref/tests/test_haplotype.py (1)
151-179:⚠️ Potential issue | 🟡 MinorAdd the
== window_sizeregression case.These tests cover gaps smaller than and larger than
window_size, but the changed split condition is the inclusive>= window_sizeboundary. One exact-threshold case would lock down the bug fix much better.As per coding guidelines, "bug fixes should include regression test".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@divref/tests/test_haplotype.py` around lines 151 - 179, Add a regression test that covers the boundary case where the inter-variant gap equals window_size: create a haplotype table via _make_haplotype_table with variants whose positions have a gap exactly window_size apart, call split_haplotypes(..., window_size=XYZ) and assert the behavior matches the inclusive change (i.e., segments should split when gap >= window_size). Place this new test alongside test_split_haplotypes_splits_at_large_gap and test_split_haplotypes_discards_singleton_segment, using the same patterns (collect(), checking len(rows) and the variant positions) to lock in the >= window_size behavior.divref/divref/haplotype.py (1)
41-47:⚠️ Potential issue | 🟡 MinorDocument the invalid-reference failure in
Raises:.The new
reference_genomeparameter adds a public error path, anddivref/tests/test_haplotype.pyLines 70-76 now lockKeyErrorin as observable behavior. The docstring should reflect that, or the function should normalize the exception before exposing it.As per coding guidelines, "Update docstrings when function signature or behavior changes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@divref/divref/haplotype.py` around lines 41 - 47, The haplotype function's public parameter reference_genome can now surface a KeyError when the reference lookup fails; update the haplotype docstring Raises section to list KeyError (e.g., "KeyError: If the provided reference_genome is not found/invalid") so callers are informed, referencing the reference_genome parameter and the haplotype function, or alternatively normalize the error inside haplotype by catching KeyError from the reference lookup and re-raising a ValueError with a clear message about an invalid reference_genome.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@divref/tests/test_haplotype.py`:
- Around line 75-76: The test's call to get_haplo_sequence inside the
pytest.raises block is exceeding line-length (Ruff E501); reformat the
pytest.raises block by breaking the get_haplo_sequence invocation across
multiple lines so each argument is on its own line and the call fits within
line-length limits (update the with pytest.raises(KeyError,
match="nonexistent_genome") block to contain a multi-line call to
get_haplo_sequence with context_size=2, variants=[variant], and
reference_genome="nonexistent_genome"). Ensure indentation stays consistent and
the pytest.raises context encloses the wrapped call.
---
Outside diff comments:
In `@divref/divref/haplotype.py`:
- Around line 41-47: The haplotype function's public parameter reference_genome
can now surface a KeyError when the reference lookup fails; update the haplotype
docstring Raises section to list KeyError (e.g., "KeyError: If the provided
reference_genome is not found/invalid") so callers are informed, referencing the
reference_genome parameter and the haplotype function, or alternatively
normalize the error inside haplotype by catching KeyError from the reference
lookup and re-raising a ValueError with a clear message about an invalid
reference_genome.
In `@divref/tests/test_haplotype.py`:
- Around line 151-179: Add a regression test that covers the boundary case where
the inter-variant gap equals window_size: create a haplotype table via
_make_haplotype_table with variants whose positions have a gap exactly
window_size apart, call split_haplotypes(..., window_size=XYZ) and assert the
behavior matches the inclusive change (i.e., segments should split when gap >=
window_size). Place this new test alongside
test_split_haplotypes_splits_at_large_gap and
test_split_haplotypes_discards_singleton_segment, using the same patterns
(collect(), checking len(rows) and the variant positions) to lock in the >=
window_size behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3fbb19ab-96cc-495e-90b8-7c3feaf77f6f
📒 Files selected for processing (3)
divref/divref/haplotype.pydivref/tests/conftest.pydivref/tests/test_haplotype.py
🚧 Files skipped from review as they are similar to previous changes (1)
- divref/tests/conftest.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
divref/tests/test_haplotype.py (1)
22-22: Broaden helper input type on Line 22.
_make_haplotype_tablecan accept a more general collection type thanlist[...], which makes the helper easier to reuse without changing behavior.Suggested refactor
-def _make_haplotype_table(variant_positions: list[tuple[str, int, str, str]]) -> hl.Table: +from collections.abc import Iterable + +def _make_haplotype_table( + variant_positions: Iterable[tuple[str, int, str, str]] +) -> hl.Table:As per coding guidelines, "Accept the most general type practical for parameters (e.g., Iterable over List)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@divref/tests/test_haplotype.py` at line 22, Change _make_haplotype_table to accept a more general collection type instead of list: update the parameter type from list[tuple[str, int, str, str]] to Iterable[tuple[str, int, str, str]] (or Sequence[...] if you need indexing) in the function signature for _make_haplotype_table; import the appropriate typing name (Iterable or Sequence) and ensure any code inside _make_haplotype_table does not rely on list-specific methods (if it does, convert to a list locally with list(...)) so behavior remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@divref/tests/test_haplotype.py`:
- Line 22: Change _make_haplotype_table to accept a more general collection type
instead of list: update the parameter type from list[tuple[str, int, str, str]]
to Iterable[tuple[str, int, str, str]] (or Sequence[...] if you need indexing)
in the function signature for _make_haplotype_table; import the appropriate
typing name (Iterable or Sequence) and ensure any code inside
_make_haplotype_table does not rely on list-specific methods (if it does,
convert to a list locally with list(...)) so behavior remains unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: deb78240-ed70-4ee2-b177-b6fd3050b384
📒 Files selected for processing (1)
divref/tests/test_haplotype.py
Summary by CodeRabbit
New Features
Bug Fixes
Tests