Skip to content

This project involves a comprehensive analysis of Netflix's movies and TV shows data using SQL. The goal is to extract valuable insights and answer various business questions based on the dataset. The following README provides a detailed account of the project's objectives, business problems, solutions, findings, and conclusions.

Notifications You must be signed in to change notification settings

Jatin-s16/Netflix_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Netflix Movies and TV Shows Data Analysis using SQL

Overview

This project involves a comprehensive analysis of Netflix's movies and TV shows data using SQL. The goal is to extract valuable insights and answer various business questions based on the dataset. The following README provides a detailed account of the project's objectives, business problems, solutions, findings, and conclusions.

Objectives

  • Analyze the distribution of content types (movies vs TV shows).
  • Identify the most common ratings for movies and TV shows.
  • List and analyze content based on release years, countries, and durations.
  • Explore and categorize content based on specific criteria and keywords.

Dataset

The data for this project is sourced from the Kaggle dataset:

Schema

DROP TABLE IF EXISTS netflix;
CREATE TABLE netflix
(
    show_id      VARCHAR(5),
    type         VARCHAR(10),
    title        VARCHAR(250),
    director     VARCHAR(550),
    casts        VARCHAR(1050),
    country      VARCHAR(550),
    date_added   VARCHAR(55),
    release_year INT,
    rating       VARCHAR(15),
    duration     VARCHAR(15),
    listed_in    VARCHAR(250),
    description  VARCHAR(550)
);

Business Problems and Solutions

1. Count the Number of Movies vs TV Shows

SELECT 
	type,
	COUNT (*) AS total_count
FROM netflix
GROUP BY type;

Objective: Determine the distribution of content types on Netflix.

2. Find the Most Common Rating for Movies and TV Shows

WITH RatingCounts AS (
    SELECT 
        type,
        rating,
        COUNT(*) AS rating_count
    FROM netflix
    GROUP BY type, rating
),
RankedRatings AS (
    SELECT 
        type,
        rating,
        rating_count,
        RANK() OVER (PARTITION BY type ORDER BY rating_count DESC) AS rank
    FROM RatingCounts
)
SELECT 
    type,
    rating AS most_frequent_rating
FROM RankedRatings
WHERE rank = 1;

Objective: Identify the most frequently occurring rating for each type of content.

3. List All Movies Released in a Specific Year (e.g., 2020)

SELECT 
	title
FROM netflix
WHERE 
	type = 'Movie' 
	AND 
	release_year = 2020;

Objective: Retrieve all movies released in a specific year.

4. Find the Top 5 Countries with the Most Content on Netflix

SELECT 
	TRIM(UNNEST(STRING_TO_ARRAY(country, ','))) AS final_countries,     -- since multiple countries
	COUNT(*) AS total_count
FROM netflix
GROUP BY 1
ORDER BY total_count DESC
LIMIT 5;

Objective: Identify the top 5 countries with the highest number of content items.

5. Identify the Longest Movie

SELECT *
FROM(
	SELECT 
		DISTINCT title AS movie,
		SPLIT_PART(duration,' ',1)::numeric AS duration
  	FROM netflix
  	WHERE type ='Movie'
  	) AS T1
WHERE 
	duration = (
				SELECT
					MAX(SPLIT_PART(duration,' ',1)::numeric) 
				FROM netflix);

Objective: Find the movie with the longest duration.

6. Find Content Added in the Last 5 Years

SELECT *
FROM netflix
WHERE TO_DATE(date_added, 'Month DD, YYYY') >= CURRENT_DATE - INTERVAL '5 years';

Objective: Retrieve content added to Netflix in the last 5 years.

7. Find All Movies/TV Shows by Director 'Rajiv Chilaka'

SELECT 
	title
FROM(
	SELECT 
		*,
		UNNEST(STRING_TO_ARRAY(director, ',')) AS director_name
	FROM
	netflix
	) AS t1
WHERE 
	director_name = 'Rajiv Chilaka';

Objective: List all content directed by 'Rajiv Chilaka'.

8. List All TV Shows with More Than 5 Seasons

SELECT 
	title
FROM netflix
WHERE
	type = 'TV Show'
	AND
	SPLIT_PART(duration, ' ', 1)::INT > 5;

Objective: Identify TV shows with more than 5 seasons.

9. Count the Number of Content Items in Each Genre

SELECT 
    UNNEST(STRING_TO_ARRAY(listed_in, ',')) AS genre,
    COUNT(*) AS total_content
FROM netflix
GROUP BY 1;

Objective: Count the number of content items in each genre.

10.Find each year and the average numbers of content release in India on netflix.

return top 5 year with highest avg content release!

SELECT 
	release_year,
	COUNT(show_id) AS total_release,
	ROUND(
		COUNT(show_id)::numeric/(SELECT COUNT(show_id) FROM netflix WHERE country = 'India')::numeric * 100 
		,2
		)
		AS avg_release
FROM netflix
WHERE country = 'India' 
GROUP BY country, 1
ORDER BY avg_release DESC 
LIMIT 5;

Objective: Calculate and rank years by the average number of content releases by India.

11. List All Movies that are Documentaries

SELECT 
	title
FROM netflix
WHERE listed_in LIKE '%Documentaries';

Objective: Retrieve all movies classified as documentaries.

12. Find All Content Without a Director

SELECT * 
FROM netflix
WHERE director IS NULL;

Objective: List content that does not have a director.

13. Find How Many Movies Actor 'Salman Khan' Appeared in the Last 10 Years

SELECT * 
FROM netflix
WHERE 
	casts LIKE '%Salman Khan%'
	AND 
	release_year >= EXTRACT(YEAR FROM CURRENT_DATE) - 10;

Objective: Count the number of movies featuring 'Salman Khan' in the last 10 years.

14. Find the Top 10 Actors Who Have Appeared in the Highest Number of Movies Produced in India

SELECT 
	UNNEST(STRING_TO_ARRAY(casts, ',')) AS actor,
	COUNT(*)
FROM netflix
WHERE country = 'India'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

Objective: Identify the top 10 actors with the most appearances in Indian-produced movies.

15. Categorize Content Based on the Presence of 'Kill' and 'Violence' Keywords

SELECT 
    category,
	type,
    COUNT(*) AS content_count
FROM (
    SELECT 
		*,
        CASE 
            WHEN 
				description ILIKE '%kill%' 
				OR 
				description ILIKE '%violence%' 
				THEN 'Bad'
            ELSE 'Good'
        END AS category
    FROM netflix
) AS categorized_content
GROUP BY 1,2
ORDER BY 2;

Objective: Categorize content as 'Bad' if it contains 'kill' or 'violence' and 'Good' otherwise. Count the number of items in each category.

Findings and Conclusion

  • Content Distribution: The dataset contains a diverse range of movies and TV shows with varying ratings and genres.
  • Common Ratings: Insights into the most common ratings provide an understanding of the content's target audience.
  • Geographical Insights: The top countries and the average content releases by India highlight regional content distribution.
  • Content Categorization: Categorizing content based on specific keywords helps in understanding the nature of content available on Netflix.

This analysis provides a comprehensive view of Netflix's content and can help inform content strategy and decision-making.

About

This project involves a comprehensive analysis of Netflix's movies and TV shows data using SQL. The goal is to extract valuable insights and answer various business questions based on the dataset. The following README provides a detailed account of the project's objectives, business problems, solutions, findings, and conclusions.

Topics

Resources

Stars

Watchers

Forks