Insurance Risk Analytics

This repository contains code and analysis for the 10 Academy Week 1 Challenge, focusing on insurance risk analytics using Python, with exploratory data analysis (EDA), data versioning, hypothesis testing, and predictive modeling.

Setup

Clone the repo: git clone https://github.com/Bisrath1/Insurance-risk-analytics.git
Create virtual environment: python3 -m venv venv
Activate:
- On Unix/Linux/MacOS: source venv/bin/activate
- On Windows: venv\Scripts\activate
Install dependencies: pip install -r requirements.txt

Structure

.dvc/: Configuration for DVC local remote storage.
SM/data/: Synthetic claims dataset managed with DVC.
data/: Insurance datasets (insurance.csv, insurance_v2.csv) versioned with DVC.
figures/: Visualizations from modeling and analysis (e.g., SHAP plots).
notebooks/: Jupyter notebooks for EDA and analysis.
results/: Outputs from regression and classification models with SHAP analysis.
src/: Python scripts for reusable code (e.g., modeling, data processing).
task_3_hypothesis_testing/: Scripts and results for A/B testing.
.dvcignore: Files/folders excluded from DVC tracking.
LICENSE: MIT License for the project.
requirements.txt: Python dependencies, including DVC.

Progress

Task 1: Git and EDA
- Set up repository: https://github.com/Bisrath1/Insurance-risk-analytics.
- Performed EDA on insurance.csv, analyzing charges by region, sex, and smoker status (branch: task-1).
Task 2: DVC
- Initialized DVC for data version control.
- Versioned insurance.csv and insurance_v2.csv using DVC.
- Configured local DVC remote storage.
- Added synthetic claims dataset to SM/data/ (commit: 3 weeks ago).
Task 3: A/B Testing
- Conducted hypothesis testing on insurance charges by region and sex (branch: task-3).
- Findings: Significant differences in charges observed in the southeast region (details in task_3_hypothesis_testing/).
Task 4: Modeling
- Built regression and classification models for claim severity, premium, and claim probability (branch: task-4, merged in commit: 3 weeks ago).
- Best model: XGBoost (RMSE: X, R²: Y).
- SHAP analysis: Smoker status and age identified as most influential features (visualizations in figures/).

Usage

Run Jupyter notebooks for EDA and visualizations:
```
jupyter lab
```
Fetch versioned data with DVC:
```
dvc pull
```
Execute modeling scripts:
```
python src/models.py
```

Run hypothesis tests:

python task_3_hypothesis_testing/run_tests.py

Recommendations

Adjust insurance premiums for smokers and high-risk regions based on EDA and modeling insights.
Implement claim probability model for risk-based pricing strategies.

Contributing

Fork the repository.
Create a feature branch: git checkout -b feature-branch
Commit changes: git commit -m "Add feature"
Push to the branch: git push origin feature-branch
Open a pull request.

License

This project is licensed under the MIT License.

Contact

Maintained by Bisrath1. For questions or feedback, open a GitHub issue or contact the owner directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Insurance Risk Analytics

Setup

Structure

Progress

Usage

Recommendations

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.dvc		.dvc
SM/data		SM/data
data		data
figures		figures
notebooks		notebooks
results		results
src		src
task_3_hypothesis_testing		task_3_hypothesis_testing
.dvcignore		.dvcignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Bisrath1/Insurance-risk-analytics

Folders and files

Latest commit

History

Repository files navigation

Insurance Risk Analytics

Setup

Structure

Progress

Usage

Recommendations

Contributing

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages