This project simulates an End-to-End ETL (Extract, Transform, Load) process and data analysis workflow focusing on the global unicorn startup landscape.
Due to the limited availability of comprehensive open-source unicorn databases, this project utilizes Python to programmatically generate a highly realistic relational dataset. The generated data is then loaded into an in-memory SQLite database, allowing for robust data extraction and aggregation using SQL, followed by data visualization to uncover actionable business insights.
- Data Engineering (ETL): Designed a data generation pipeline using
FakerandNumPyvectorization to create thousands of data points efficiently. - Database Management: Modeled relational data (Companies, Industries, Funding, Dates) and integrated it with an SQLite RDBMS.
- SQL Querying: Extracted and aggregated data using complex SQL
JOIN,GROUP BY, and mathematical operations. - Data Visualization: Translated raw data into clean, business-ready charts using
SeabornandMatplotlib.
- Language: Python 3.10
- Database: SQLite (In-Memory)
- Libraries: Pandas, NumPy, Faker, Matplotlib, Seaborn
Through SQL queries and visual analysis, this notebook answers three core business questions:
- Industry Dominance: Which industries produce the highest number of unicorn startups?
- Top Valuations: Which 5 companies have the highest valuations, and where are they geographically located?
- Historical Trends: How has the number of newly crowned unicorns grown over the years?
unicorn_analysis.ipynb: The main Jupyter Notebook containing the ETL pipeline, SQL queries, and visualizations.environment.yml: Conda environment configuration file to reproduce the exact dependencies used in this project..gitignore: Configured to ignore auto-generated dummy CSV files and local cache.
To replicate the environment and run the notebook locally, follow these steps:
- Clone this repository:
git clone [https://github.com/subki72/unicorn-startup-analysis.git](https://github.com/subki72/unicorn-startup-analysis.git) cd unicorn-startup-analysis
2. Create the conda environment from the provided configuration file:
```bash
conda env create -f environment.yml
- Activate the environment:
conda activate unicorn_env
- Launch Jupyter Notebook:
jupyter notebook