A Claude Code skill that transforms Claude into a professional statistical consultant. Instead of blindly running whatever method the user requests, it diagnoses first, then analyzes — checking data quality, validating assumptions, and automatically selecting the appropriate statistical method.
When you upload a dataset and describe your analysis needs, Claude will:
- Data Profile (automatic) — Sample size, variable types, missing patterns, distributions, outliers
- Assumption Checking — Normality, homogeneity of variance, multicollinearity; auto-switches methods when assumptions fail
- Execute Analysis — Runs the appropriate statistical method with proper controls
- Output Triple — Every analysis produces three deliverables:
- APA 7th edition table (Excel + Markdown)
- Publication-quality figure (300dpi PNG)
- Results paragraph ready for manuscript (English/Chinese)
| Complexity | Methods | Workflow |
|---|---|---|
| Simple | Descriptive stats, t-test, chi-square, correlation, reliability | Fast path (0 confirmations) |
| Medium | Regression, ANOVA, moderation, mediation, ROC/AUC, survival | Light path (1 confirmation) |
| Complex | SEM/CFA, HLM, IRT, meta-analysis, RI-CLPM | Full path (3-4 confirmations) |
| Planning | Power analysis / sample size calculation | Dedicated path (no data needed) |
# Copy to your Claude skills directory
cp -r . ~/.claude/skills/statistical-analysis/Claude Code will automatically detect and activate this skill when you:
- Upload a data file (.xlsx, .csv, .sav)
- Say "help me analyze" / "run statistical analysis"
- Mention a specific method (t-test, regression, SEM, etc.)
For advanced methods that require R (SEM path diagrams, HLM, IRT):
cd docker/
chmod +x r-stat.sh
./r-stat.sh build # Build the Docker image (~2GB)
./r-stat.sh test # Verify installationPre-installed R packages: lavaan, lme4, metafor, mirt, psych, tidyverse, semPlot, effectsize, and 30+ more.
.
├── SKILL.md # Core skill definition (Claude reads this)
├── references/
│ ├── methods-index.md # Method selection decision trees
│ ├── code-patterns.md # Python/R code templates
│ ├── table-formats.md # APA table format specifications
│ └── full-workflow.md # Full path detailed workflow
├── docker/
│ ├── Dockerfile # R environment (rocker/tidyverse + 40 packages)
│ ├── r-stat.sh # Convenience script for Docker operations
│ ├── README.md # Docker setup guide
│ └── examples/ # SEM, HLM, meta-analysis R examples
└── assets/
└── report-template.md # Analysis report template
The v3 approach was "user says do X, we do X." The v4 approach is "check the data and assumptions first, then decide what to do." This prevents common mistakes like:
- Running parametric tests on non-normal data
- Missing multicollinearity in regression
- Using wrong effect size measures
- Ignoring missing data patterns
When assumptions fail, the skill automatically switches to the appropriate alternative and informs the user:
> Note: Variable X failed the normality test (Shapiro-Wilk p = .003),
> automatically switched to Mann-Whitney U test (non-parametric alternative).
Every analysis must produce all three outputs (table + figure + paragraph). A built-in check_output_triplet() mechanism ensures nothing is skipped.
User: "I have survey data, please analyze the relationship between gaming addiction and mental health, controlling for demographics."
Claude will:
- Run data profile (N, distributions, missing patterns)
- Check: normality, VIF, outliers
- Recommend: hierarchical regression (Step 1: demographics, Step 2: gaming addiction)
- Execute with HC3 robust standard errors (if heteroscedasticity detected)
- Output: regression table + coefficient forest plot + APA results paragraph
- Claude Code with skills support
- Python: pandas, numpy, scipy, statsmodels, matplotlib, seaborn (installed automatically)
- R Docker (optional): Docker Desktop for advanced methods
MIT License - see LICENSE
- v4.1 (2025-02-09): Mandatory output triple — figure generation embedded in all workflow paths
- v4.0 (2025-02-09): "Diagnose before analyze" — data profiling, assumption checking, APA paragraph generation, medical research methods
- v3.0 (2025-02-02): Three-tier path system, R Docker environment, APA table specifications