A comprehensive statistical framework for analyzing A/B tests with both frequentist and Bayesian approaches. Make data-driven decisions with confidence.
This framework provides end-to-end A/B testing capabilities: from experimental design (sample size calculation) to post-test analysis (statistical significance, business impact). Built for data scientists who need rigorous statistical methods combined with intuitive visualizations.
- Two-proportion Z-test with p-values
- 95% confidence intervals
- Effect size calculations
- Multiple testing correction awareness
- Posterior probability distributions
- P(Treatment > Control) calculation
- Credible intervals
- Monte Carlo simulations (100K iterations)
- Pre-test sample size calculator
- Statistical power estimation
- Minimum Detectable Effect (MDE) analysis
- Test duration recommendations
- Revenue impact projections
- ROI calculations
- Risk assessment
- Actionable recommendations
- Real-time analysis
- Multiple visualization tabs
- Test simulation mode
- Sample size calculator
- Python 3.8+
- Streamlit: Interactive web framework
- Scipy & Statsmodels: Statistical testing
- Plotly: Interactive visualizations
- NumPy & Pandas: Data manipulation
- Jupyter Notebook: Analysis documentation
ab-testing-framework/
โ
โโโ data/
โ โโโ ab_test_data.csv # Synthetic test data
โ
โโโ notebooks/
โ โโโ 01_ab_testing_analysis.ipynb # Complete analysis
โ
โโโ models/
โ โโโ test_results.json # Saved analysis results
โ โโโ ab_test_report.png # Summary report
โ
โโโ app.py # Streamlit dashboard
โโโ requirements.txt
โโโ README.md
โโโ .gitignore
- Python 3.8 or higher
- pip package manager
- Clone the repository
git clone https://github.com/Emart29/ab-testing-framework.git
cd ab-testing-framework- Install dependencies
pip install -r requirements.txt- Run the application
streamlit run app.py- Open your browser and navigate to
http://localhost:8501
- Control Group: 10,000 users, 11.45% conversion
- Treatment Group: 10,000 users, 14.09% conversion
- Absolute Lift: 2.64 percentage points
- Relative Lift: 23.06%
- P-value: < 0.000001 (highly significant)
- Z-statistic: 5.59
- Confidence Interval: [1.72%, 3.56%]
- Bayesian P(B>A): 100.0%
- Revenue Lift per User: $3.27
- Annual Revenue Impact: $3.9M (for 100K monthly users)
- Recommendation: โ Launch Treatment
-
Hypothesis Testing
- Hโ: p_treatment โค p_control
- Hโ: p_treatment > p_control
- Significance level: ฮฑ = 0.05
-
Two-Proportion Z-Test
- Tests equality of proportions
- Calculates exact p-values
- Provides confidence intervals
- Prior Distribution: Uniform Beta(1,1)
- Posterior: Beta(successes + 1, failures + 1)
- Monte Carlo: 100,000 simulations
- Output: Direct probability statements
- Cohen's h: Effect size for proportions
- Power: 80% (industry standard)
- Accounts for: Type I and Type II errors
- E-commerce: Test new checkout flows
- SaaS Products: Compare onboarding experiences
- Marketing: Evaluate campaign effectiveness
- Product Features: Validate new features
- Pricing: Test pricing strategies
- Proper hypothesis testing
- Multiple approaches (frequentist + Bayesian)
- Power analysis for experimental design
- Business impact quantification
- Statistical inference
- Experimental design
- Causal reasoning
- Business translation
- A/B test design and analysis
- Frequentist vs Bayesian statistics
- Sample size calculation
- Statistical power concepts
- Business metrics translation
- Interactive dashboard development
- Sequential testing (early stopping)
- Multi-armed bandit algorithms
- CUPED variance reduction
- Stratified analysis
- Multiple metric tracking
- Automated monitoring and alerts
- Integration with analytics platforms
- Statistical Methods: Two-proportion Z-test, Beta-Binomial conjugacy
- Sample Size: GPower methodology
- Best Practices: Kohavi, Tang & Xu (Trustworthy Online Controlled Experiments)
- โ Conversion rate optimization
- โ Click-through rate testing
- โ Binary outcome metrics
- โ Independent user assignment
- Assumes independent observations
- Binary outcomes only (extend for continuous)
- No correction for multiple testing (implement Bonferroni/FDR if needed)
- Requires proper randomization
A/B testing is the gold standard for causal inference in tech companies. This project demonstrates:
- Statistical Maturity: Understanding both frequentist and Bayesian approaches
- Practical Application: Sample size calculations prevent underpowered tests
- Business Acumen: Translating statistics into revenue impact
- Communication: Interactive dashboards for stakeholders
Most ML models are descriptive. A/B testing is prescriptive. This shows understanding on how to make causal claims and drive business decisions.
[Your Name]
- LinkedIn: Emmanuel NWanguma
- GitHub: Emart29
- Email: nwangumaemmanuel29@gmail.com
This project is licensed under the MIT License.
- Statistical methodology based on industry best practices
- Inspired by experimentation platforms at major tech companies
- Built as part of a comprehensive data science portfolio
โญ If this helped you understand A/B testing, please star the repo!
๐ฌ Questions? Open an issue or reach out on LinkedIn