This project involves analyzing a Netflix dataset using Python (pandas and SQL), cleaning the data, and visualizing insights with Power BI. The objective is to explore content distribution, types, durations, and trends over time.
netflix-data-analysis/ βββ data/ β βββ netflix_titles.xlsx # Raw Netflix dataset β βββ CLEANED_DATA.xlsx # Cleaned data after preprocessing βββ scripts/ β βββ clean.py # Python script for data cleaning β βββ clean_using_sql.py # Python script to query data using SQLite βββ database/ β βββ cleaned_data.db # SQLite database generated from cleaned data βββ dashboard/ β βββ netflix_dashboard.pbix # Power BI dashboard file βββ README.md # Project documentation
- Removed or handled null values
- Standardized column formats (e.g.,
date_added,duration) - Resolved data type mismatches (e.g., year to integer)
- Used
sqlite3andpandas.read_sql()to run SQL queries on the dataset - Extracted insights like:
- Number of shows per year
- Movies vs TV shows distribution
- Country-wise production frequency
- Duration patterns by content type
- Map of content distribution by country π
- Line/bar chart showing release trends over the years π
- Comparison of durations across content types β±
- Filters by genre, country, and type
- Python (pandas, sqlite3)
- Power BI (interactive dashboards)
- SQLite (in-memory + persistent queries)
- Excel (for initial and final data handling)