Skip to content

SQL analysis of Netflix dataset using BigQuery, business insights, queries, EDA, and professional documentation.

Notifications You must be signed in to change notification settings

0xNic11/netflix-sql-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Netflix SQL Analysis (BigQuery Mini-Project)

Week 2 β€” Data Science & Machine Learning Roadmap

This project performs a complete SQL-based analysis of the Netflix Movies and TV Shows dataset, using Google BigQuery as the query engine and SQL as the primary analysis language.
The goal is to answer real business-style questions, extract insights, and practice professional SQL workflows used in data analytics and ML engineering roles.


πŸ“ Project Structure

netflix-sql-analysis/
│── week2_project_netflix_sql.ipynb # Full notebook with observations
│── dataset # Netflix Dataset
β”‚ β”œβ”€β”€ netflix_titles.csv
│── queries/ # All SQL queries used in analysis
β”‚ β”œβ”€β”€ A1_movies_vs_tv.sql
β”‚ β”œβ”€β”€ A2_titles_per_rating.sql
β”‚ β”œβ”€β”€ A3_titles_per_year.sql
β”‚ β”œβ”€β”€ A4_top_countries.sql
β”‚ β”œβ”€β”€ B1_horror_titles.sql
β”‚ β”œβ”€β”€ B2_uk_movies.sql
β”‚ β”œβ”€β”€ B3_tvma_after_2015.sql
β”‚ β”œβ”€β”€ B4_missing_director_but_cast.sql
β”‚ β”œβ”€β”€ C1_countries_more_than_20.sql
β”‚ β”œβ”€β”€ C2_rare_ratings.sql
β”‚ β”œβ”€β”€ C3_years_over_100.sql
β”‚ β”œβ”€β”€ D1_top_genres.sql
β”‚ β”œβ”€β”€ D2_common_director.sql
β”‚ β”œβ”€β”€ D3_first_last_titles.sql
│── screenshots/ # Screenshots of query results
β”‚ β”œβ”€β”€ A1_movies_vs_tv.png
β”‚ β”œβ”€β”€ A2_titles_per_rating.png
β”‚ └── ... etc
│── README.md

🎯 Project Objectives

This project answers important business questions such as:

  • How many Movies vs TV Shows are on Netflix?
  • Which ratings are most common?
  • Which countries produce the most content?
  • How has Netflix’s content changed over time?
  • Which genres dominate the platform?
  • Who are the most common directors?
  • Are there rare ratings or niche content categories?

The analysis uses grouping, filtering, sorting, aggregations, string operations, and
BigQuery-specific functions such as UNNEST and SPLIT.


πŸ› οΈ Tools & Technologies

  • Google BigQuery (Primary Tool)
  • SQL (Standard Syntax)
  • Jupyter Notebook
  • Python (for documentation only)
  • GitHub for version control

πŸ“Š Key Insights

  • Movies dominate Netflix’s library, significantly outnumbering TV Shows.
  • The most common rating on the platform is TV-MA, reflecting mature content.
  • The United States and India are the leading contributors of Netflix content.
  • Release-year trends show a massive content expansion after 2015.
  • Genres like International Dramas, Documentaries, and Comedies appear most frequently.
  • A large portion of entries have missing director metadata, which affects deeper analysis.

πŸ“₯ Dataset

This project uses the Netflix Titles dataset, containing Movies and TV Shows with metadata such as:

  • Title
  • Type
  • Rating
  • Release Year
  • Country
  • Director
  • Cast
  • Genres (listed_in)

πŸš€ How to Run This Project

1. Open Google BigQuery

  • Create a project
  • Upload the CSV dataset to a dataset (e.g., netflix_db)
  • Use table name: netflix_titles

2. Run the SQL queries

All .sql files inside the queries/ folder can be run directly in BigQuery.

3. Open the Jupyter Notebook

File: week2_project_netflix_sql.ipynb
Review:

  • Business questions
  • SQL outputs
  • Professional observations

πŸ“˜ Author

Nicolas
Data Science & Machine Learning Engineer β€” Roadmap Week 2 Project


About

SQL analysis of Netflix dataset using BigQuery, business insights, queries, EDA, and professional documentation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published