Modern Python tools for complex survey analysis, built for real-world statistical workflows.
svy is a rigorously design-based yet production-oriented ecosystem for survey design, weighting, estimation, and small area estimation — without sacrificing transparency or scalability.
🌐 Website: https://svylab.com
📘 Documentation: https://svylab.com/docs
Tip
Validation: Want to assess the correctness of svy?
See our comparison with R’s survey package, showing numerically identical results across Taylor linearization, replication methods, and complex survey designs.
The svy libraries are not yet publicly downloadable.
This repository is intentionally public before the code release so that early users can:
- ask questions,
- report documentation gaps,
- suggest features,
- discuss real-world survey use cases,
- help shape stable APIs.
📘 Documentation is live
🧪 Code is under finalization
🐞 Issues & discussions are open
When the first public releases are ready, this repository will become the main code home.
svy is designed for people who actually work with complex survey data, including:
- National statistical offices
- Public health and development programs
- Survey methodologists
- Data scientists working with complex samples
The guiding principle is:
Correct inference first — without hiding assumptions or sacrificing usability.
svy prioritizes statistical validity while remaining compatible with modern Python workflows.
The svy ecosystem is being built to support:
- Complex survey design (strata, clusters, weights)
- Design-based estimation with valid standard errors
- Replication methods (BRR, bootstrap, jackknife)
- Small Area Estimation (area- and unit-level models)
- Explicit, inspectable, reproducible outputs
- Integration with Polars, NumPy, SciPy, and JAX-based tooling
All methods are grounded in established survey methodology.
The example below shows the intended public API. It reflects the current design but cannot yet be run until the first release.
pip install svyimport svy
hld_data = svy.load_dataset(name="hld_sample_wb_2023", limit=None)
hld_design = svy.Design(stratum=("geo1", "urbrur"), psu="ea", wgt="hhweight")
hld_sample = svy.Sample(data=hld_data, design=hld_design)
tot_exp_mean = hld_sample.estimation.mean(y="tot_exp")
print(tot_exp_mean)pip install svy-saeimport svy_sae as sae
milk = svy.load_dataset(name="milk", limit=None)
milk_model = sae.AreaLevel(milk)
milk_preds = milk_model.fh(
y="yi",
x=svy.Cat("MajorArea", ref=1),
variance="variance",
area="SmallArea",
method="REML",
mse="prasad_rao",
)
print(milk_preds)No shortcuts.
No hidden assumptions.
Just correct survey inference.
| Package | Purpose | Status |
|---|---|---|
| svy | Core survey design & estimation | In progress |
| svy-sae | Small Area Estimation | In progress |
| svy-io | SPSS / Stata / SAS I/O | In progress |
Installation instructions will be added once packages are published.
Includes conceptual guides, tutorials, and methodological notes reflecting the intended stable APIs.
Early feedback is strongly encouraged.
- Issues: https://github.com/samplics-org/svy/issues
- Discussions: https://github.com/samplics-org/svy/discussions
If you work with complex surveys and want to influence the design of a modern Python survey stack, this is the right place to engage.
MIT License
Copyright © 2026 Samplics LLC
svy is built for practitioners who need statistical rigor that survives contact with reality.