Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
359 changes: 359 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,359 @@
# CLAUDE.md

## Template Setup

**On first interaction in a new project:** Check if these sections still contain placeholder text:
- Project Overview (look for `[Add 2-3 sentences`)
- Key Concepts (look for `[Term]:`)
- Codebase Architecture (look for `[Add tree output`)

If any are unpopulated, prompt the user to fill them in before proceeding with other work. Walk through each section conversationally—ask about the project, then draft content for the user to approve.

Delete this section once setup is complete.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Follow through on setup cleanup instruction.

Line 12 says to delete setup content once complete, but the setup/template section is still present with placeholders.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` at line 12, Remove the leftover setup/template section that
contains the placeholder line "Delete this section once setup is complete." and
any related placeholder instructions or TODOs in CLAUDE.md; replace it with the
finalized setup steps or delete the whole block so the file no longer includes
incomplete setup content or placeholders.


## How Claude Should Work

### Before Starting
- Confirm understanding of task scope before writing code
- For changes >400 lines or spanning multiple modules, propose an implementation plan first (see Planning Documents below)
- Identify which existing modules/patterns to follow
- Check for related issues/PRs with `gh issue list` or `gh pr list`

### During Development
- Commit at logical boundaries (see Commit Granularity below)
- When uncertain about a design choice, pause and ask rather than guess
- Note assumptions made and flag them for review

### Decision-Making
**Ask first:**
- Architectural changes or new patterns
- New dependencies
- Deviating from established patterns
- Changes to public APIs
- Performance vs. readability tradeoffs

**Decide and document:**
- Implementation details within established patterns
- Test organization and naming
- Internal/private naming choices
- Order of operations within a function

When uncertain, state your assumption explicitly: *"Assuming X—correct me if that's wrong."*

### When Claude Disagrees
- Explain the concern clearly with specifics
- Offer alternatives rather than just objecting
- For style preferences, defer to project conventions
- For correctness/safety issues, push back firmly but constructively

### Context Maintenance
- Reference specific file paths and function names, not vague descriptions
- When resuming work, summarize current state and next steps
- Proactively save plans and notes to `agent_notes/`

### Planning Documents
For changes >400 lines or spanning multiple modules, create a plan:

```markdown
# agent_notes/YYYY-MM-DD_brief-description.md

## Goal
[One sentence describing the end state]

## Approach
1. [First step]
2. [Second step]
...

## Files to Modify
- `src/module.py` — add/change X
- `tests/test_module.py` — add tests for X

## Open Questions
- [Anything requiring user input]

## Progress
- [ ] Step 1
- [ ] Step 2
```

---

## Project Overview

<!-- After creating a project from this template, populate this section. -->

[2-3 sentences: What does this toolkit do? What problems does it solve? Who uses it?]

### Key Concepts

<!-- List 5-10 domain terms that appear in code but may be unfamiliar.
Delete this comment block after populating. -->

- **[Term]:** Brief definition (e.g., "BAM files: Binary alignment/map files containing sequencing read alignments")
- **[Term]:** Brief definition

### Codebase Architecture

<!-- Add `tree -L 2 -d` output or equivalent, annotated with module purposes.
Update when adding top-level modules. Delete this comment block. -->

```
project/
├── src/
├── tests/
└── ...
```
Comment on lines +82 to +106
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Populate or remove template placeholders before merging.

Project Overview, Key Concepts, and Codebase Architecture still contain template placeholders, so this reads as an unfinished scaffold rather than project guidance.

Based on learnings: Check if Project Overview, Key Concepts, and Codebase Architecture sections contain placeholder text on first interaction, and prompt user to populate before proceeding.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` around lines 82 - 106, The CLAUDE.md file still contains template
placeholders in the "Project Overview", "Key Concepts", and "Codebase
Architecture" sections; replace or populate those placeholders with real project
text (2–3 sentence overview, 5–10 domain terms with brief definitions, and an
updated directory tree/architecture) and remove the HTML comment hints and
example blocks so the document is production-ready. Additionally, implement a
first-interaction validation that inspects CLAUDE.md for leftover placeholder
patterns (e.g., literal phrases like "[2-3 sentences", "<!--", or the sample
tree block) and prompts the user to populate those sections before allowing
merge or continuing (add this check into your onboarding/validation flow where
file checks occur). Ensure log/prompt text references the specific sections
"Project Overview", "Key Concepts", and "Codebase Architecture" so users know
exactly what to fill.


---

## Core Principles

**Priority order:** Correctness → Readability → Simplicity → Performance

1. **Correctness:** Structure code to be testable; isolate complex logic for unit testing.
2. **Readability:** If you'd change it on a rewrite, refactor now.
3. **Simplicity:** Prefer recognizable patterns. Don't introduce new patterns without discussion.
4. **Performance:** Correct implementation first. Profile before optimizing (see Performance section).

---

## Development Commands

```bash
# Run all checks (lint, type-check, test)
uv run poe fix-and-check-all

# Individual commands
uv run poe check-tests # Run tests
uv run poe check-lints # Lint only
uv run poe check-typing # Type-check only

# Run specific test
uv run pytest tests/test_example.py::test_name -v

# GitHub CLI
gh issue list # View open issues
gh pr list # View open PRs
gh pr view 123 # View specific PR
```

See `CONTRIBUTING.md` for full task list and environment setup.

---

## Git Workflow

### Commit Granularity

Commit after completing one of:
- A single function/method implementation
- One refactoring step (rename, extract, move)
- A bug fix with its regression test
- A documentation update

**Size guidelines:**
- Per commit: 100–300 lines preferred, 400 max
- Per PR: No hard limit, but consider splitting if >800 lines or >5 unrelated files

**Good commit scope examples:**
- `Add FastaIndex.validate() method`
- `Rename species_map → species_to_ref_fasta_map`
- `Fix off-by-one in BED coordinate parsing`

### Commit Messages

```
Concise title in imperative mood (<72 chars)

Detailed body explaining:
- What changed
- Why (link issues with "Closes #123" or "Related to #456")
- Any non-obvious implementation choices
```

### Commit Rules
- Run `uv run poe fix-and-check-all` before each commit; all checks must pass
- No merge commits
- Do not rebase without explicit user approval
- Use `git mv` for file moves; if moving *and* editing, make two commits (move first, then edits)
- **Never mix formatting and functional changes.** If unavoidable, isolate formatting into separate commits at start or end of branch.

### Pull Requests
- Title: Imperative mood, <72 chars (e.g., "Add FASTA index validation")
- Body: What changed, why, testing done, migration notes if applicable
- Link issues: "Closes #123" or "Related to #456"

### Branch Hygiene
- Use `.gitignore` liberally
- Never commit: IDE files, personal test files, local debug data, commented-out code

---

## Code Review

### As Reviewer
- Read every line of code
- Clearly distinguish suggestions from required changes; include code alternatives
- Focus on substance—avoid bikeshedding

### Review Checklist

**Correctness & Design:**
- [ ] Does the code do what it claims?
- [ ] Is it structured for testability (modular, injectable dependencies)?
- [ ] Is it appropriately generalized—not over-engineered or over-optimized?

**Readability & Style:**
- [ ] Is it readable with clear names?
- [ ] Is it idiomatic for the language?
- [ ] Would a new team member understand it without extensive context?

**Documentation & Testing:**
- [ ] Doc comments on public functions/classes?
- [ ] Code comments on non-obvious logic?
- [ ] Tests covering happy path, error conditions, edge cases?
- [ ] CHANGELOG.md updated for user-facing changes?

**Project Fit:**
- [ ] Does this code belong in this repo?
- [ ] Does it follow established patterns?

### Domain-Specific Checks

<!-- Customize for your project's domain. Examples for bioinformatics: -->

- [ ] Large file handling: streaming/chunked, not loaded into memory?
- [ ] File format edge cases handled: empty files, truncated files, malformed headers?
- [ ] Input validation performed early with clear error messages?
- [ ] Provenance preserved: input paths, parameters, versions logged where appropriate?

---

## Code Style

### Organization
- Extract logic into small–medium functions with clear inputs/outputs
- Scope variables tightly; limit visibility to where needed
- Use block comments for visual separation when function extraction isn't practical

### Naming
- Meaningful names, even if long: `species_to_ref_fasta_map` not `species_map`
- Short names only for tight scope (loop indices, single-line lambdas)
- Signal behavior in function names: `to_y()`, `is_valid()` → returns value; `update_x()` → side effect

### Documentation

**Doc comments (required on all public functions/classes):**
- What it does
- Parameters and return value
- Constraints, exceptions raised, side effects

**Code comments:**
- Explain non-obvious choices and complex logic
- Never comment self-evident code

### Type Signatures
- **Parameters:** Accept the most general type practical (e.g., `Iterable` over `List`)
- **Returns:** Return the most specific type without exposing implementation details

### Functions
- Functions should have **either** returns **or** side effects, not both
- Exceptions: logging, caching (where side effect is performance-only)

### Pragmatism
- Balance functional, OOP, and imperative—use what's clearest
- When in doubt, prefer pure functions and immutable data
- Know your utility libraries; contribute upstream rather than writing one-offs

---

## Domain Considerations

<!-- Customize for your project's domain. Examples for bioinformatics: -->

- Assume input files may be large (multi-GB); prefer streaming over loading into memory
- Validate file formats early; corrupt/truncated files are common
- Preserve provenance: log input file paths, parameters, tool versions

---

## Error Handling

- Fail fast with informative messages at I/O boundaries
- Use custom exception types for domain errors (e.g., `InvalidFastaError`)
- Never silently swallow exceptions; log or re-raise with context
- When a loop may generate multiple errors, collect them and raise once at the end
- Error messages should include: what failed, why, and how to fix (if known)

---

## External Dependencies

- Core logic must not shell out to external tools
- External tool orchestration belongs in pipeline definitions, not the toolkit
- If wrapping an external tool is necessary, isolate in a dedicated module with clear interface boundaries

---

## Performance

- Only optimize performance-critical sections (e.g., large-file processing, not config parsing)
- Workflow: profile → change → benchmark → accept or rollback
- Never guess; always measure
- Do not begin profiling or tuning without explicit user approval

---

## Testing

### Principles
- Generate test data programmatically; avoid committing test data files
- Test behavior, not implementation—tests should survive refactoring
- Cover: expected behavior, error conditions, boundary cases
- Scale rigor to code longevity: thorough for shared code, lighter for one-off scripts

### Coverage Expectations
- New public functions: at least one happy-path test + one error case
- Bug fixes: add a regression test that would have caught the bug
- Performance-critical code: include benchmark or explain in PR why not needed

---

## Documentation Maintenance

When modifying code, update as needed:
- [ ] Docstrings (if signature or behavior changed)
- [ ] CHANGELOG.md (if user-facing)
- [ ] README.md (if usage patterns changed)
- [ ] Migration notes (if breaking change)

Reference issue/PR numbers in CHANGELOG entries.

---

## Anti-patterns

| Don't | Do Instead |
|-------|------------|
| `except Exception` without re-raise | Catch specific exceptions or re-raise with context |
| Functions that mutate AND return | Separate into query (returns) and command (mutates) |
| Commented-out code | Delete it; git has history |
| `Any` type without explanation | Use `TypeVar`, `Protocol`, or add comment explaining why |
| Loading multi-GB files into memory | Stream or process in chunks |
| Vague error messages | Include file path, actual vs. expected, how to fix |

---

## Python-Specific

### Style
- Heavier use of classes and type annotations than typical Python
- Prefer `@dataclass(frozen=True)` and Pydantic models with `frozen=True`
- Isolate I/O at module boundaries; keep core logic as pure functions

### Typing
- **Required:** Type annotations on all function parameters and returns
- Annotate locals when: they become return values, or called function lacks hints
- Use type aliases or `NewType` for complex structures
- Avoid `Any`—prefer type alias or `TypeVar`
Loading