This project performs a complete SQL-based analysis of the Netflix Movies and TV Shows dataset, using Google BigQuery as the query engine and SQL as the primary analysis language.
The goal is to answer real business-style questions, extract insights, and practice professional SQL workflows used in data analytics and ML engineering roles.
netflix-sql-analysis/
βββ week2_project_netflix_sql.ipynb # Full notebook with observations
βββ dataset # Netflix Dataset
β βββ netflix_titles.csv
βββ queries/ # All SQL queries used in analysis
β βββ A1_movies_vs_tv.sql
β βββ A2_titles_per_rating.sql
β βββ A3_titles_per_year.sql
β βββ A4_top_countries.sql
β βββ B1_horror_titles.sql
β βββ B2_uk_movies.sql
β βββ B3_tvma_after_2015.sql
β βββ B4_missing_director_but_cast.sql
β βββ C1_countries_more_than_20.sql
β βββ C2_rare_ratings.sql
β βββ C3_years_over_100.sql
β βββ D1_top_genres.sql
β βββ D2_common_director.sql
β βββ D3_first_last_titles.sql
βββ screenshots/ # Screenshots of query results
β βββ A1_movies_vs_tv.png
β βββ A2_titles_per_rating.png
β βββ ... etc
βββ README.md
This project answers important business questions such as:
- How many Movies vs TV Shows are on Netflix?
- Which ratings are most common?
- Which countries produce the most content?
- How has Netflixβs content changed over time?
- Which genres dominate the platform?
- Who are the most common directors?
- Are there rare ratings or niche content categories?
The analysis uses grouping, filtering, sorting, aggregations, string operations, and
BigQuery-specific functions such as UNNEST and SPLIT.
- Google BigQuery (Primary Tool)
- SQL (Standard Syntax)
- Jupyter Notebook
- Python (for documentation only)
- GitHub for version control
- Movies dominate Netflixβs library, significantly outnumbering TV Shows.
- The most common rating on the platform is TV-MA, reflecting mature content.
- The United States and India are the leading contributors of Netflix content.
- Release-year trends show a massive content expansion after 2015.
- Genres like International Dramas, Documentaries, and Comedies appear most frequently.
- A large portion of entries have missing director metadata, which affects deeper analysis.
This project uses the Netflix Titles dataset, containing Movies and TV Shows with metadata such as:
- Title
- Type
- Rating
- Release Year
- Country
- Director
- Cast
- Genres (
listed_in)
- Create a project
- Upload the CSV dataset to a dataset (e.g.,
netflix_db) - Use table name:
netflix_titles
All .sql files inside the queries/ folder can be run directly in BigQuery.
File: week2_project_netflix_sql.ipynb
Review:
- Business questions
- SQL outputs
- Professional observations
Nicolas
Data Science & Machine Learning Engineer β Roadmap Week 2 Project