Skip to content

Add geneLME: parallel per-gene LME with flexible contrast specification#2

Open
rmsegnitz wants to merge 1 commit intomasterfrom
claude/reverent-satoshi
Open

Add geneLME: parallel per-gene LME with flexible contrast specification#2
rmsegnitz wants to merge 1 commit intomasterfrom
claude/reverent-satoshi

Conversation

@rmsegnitz
Copy link
Owner

Summary

  • Adds geneLME(), a parallelised per-gene linear mixed effects modelling function built on lme4 + emmeans, designed as a drop-in alternative to kimma::kmFit with more flexible contrast specification
  • Supports two contrast branches: Branch A (explicit pairwise interaction contrasts via a filterable contrast_spec data frame) and Branch B (named contrast vectors on marginal means with optional grouping)
  • Second-order contrasts (contrasts-of-contrasts) in both branches
  • Singular fit handling: model_status = "singular_fit" flag instead of hard stop — results returned for all genes
  • Pre-computed Branch A contrast structures (geneLME_build_contrast_args()) eliminate per-gene rebuilds in parallel workers
  • Full pre-flight input validation (11 checks) with informative error messages
  • Benchmarked vs kimma::kmFit: 100% direction agreement, ~1.8× faster at 3–6 contrasts, equal speed at 66

Files added

File Description
geneLME.R Main function file (stable, fully featured)
geneLME_test.R Full test suite with mock data
geneLME_tutorial.Rmd / .html Worked tutorial with annotated output
geneLME_function_overview.md / .html Reference documentation
geneLME_benchmark.Rmd / .html v1 benchmark vs kimma
geneLME_benchmark2.Rmd / .html v2 benchmark: sign consistency + speed
geneLME_dev.R, geneLME_dev2.R Development history (superseded; kept for reference)
CLAUDE_NOTES_geneLME.md Session log and architecture notes

Test plan

  • Source geneLME.R and run geneLME_test.R — all 7 tests should pass (Branch A, Branch A with second-order, Branch B, validation errors 6a–6f, soft-fail 6g)
  • Knit geneLME_tutorial.Rmd to verify tutorial renders cleanly
  • Review geneLME_benchmark2.html for sign consistency and speed results vs kimma

🤖 Generated with Claude Code

Introduces geneLME(), a parallelised per-gene linear mixed effects modelling
function built on lme4/emmeans, with full contrast support and benchmarking
against kimma::kmFit.

Key features:
- geneLME_contrast_spec(): pre-run helper to enumerate available contrast
  levels and build contrast_spec / contrasts_primary arguments
- geneLME_build_contrast_args(): pre-computes Branch A contrast vectors
  once before parallel dispatch (eliminates per-gene rebuilds)
- Branch A: explicit pairwise interaction contrasts via contrast_spec
- Branch B: named contrast vectors on marginal means with optional 'by' grouping
- Second-order contrasts (contrasts-of-contrasts) in both branches
- Singular fit flagging: model_status = "singular_fit" instead of hard stop;
  results returned for all genes, filter downstream on model_status
- Soft-fail on wrong-length contrasts_secondary with indexed $contrast_spec
  returned for debugging
- FDR adjustment within term (ANOVA) or contrast x order (contrasts)
- Warning suppression: lmer() rescaling + package version messages silenced
- Pre-flight input validation with informative errors (11 checks)

Benchmarked vs kimma::kmFit (2,000 genes, 8 cores, 5 reps):
- 100% direction agreement with kimma across all contrast pairs
- ~1.8x faster than kimma at 3-6 contrasts; equal speed at 66
- Estimate r=1.0, MAD~0 vs kimma after direction correction

Includes: test suite, tutorial (Rmd + HTML), function overview (md + HTML),
benchmark reports v1 and v2 (Rmd + HTML), dev history (geneLME_dev.R,
geneLME_dev2.R), and CLAUDE_NOTES session log.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant