π¬ Netflix Data Cleaning & Exploratory Data Analysis (EDA)
π§ Project Overview
This project focuses on cleaning and exploring a Netflix dataset containing details about movies and TV shows available on the platform. The goal is to understand trends in Netflixβs catalog β such as content types, release patterns, country contributions, and popular directors β while ensuring the data is clean and analysis-ready.
π― Objectives
. Clean and prepare the dataset for analysis. . Identify and handle duplicates and missing values. . Convert data types (e.g., dates) for consistency. . Explore key business questions through EDA.
π§° Tools & Libraries
- Python (Jupyter Notebook)
- Libraries:
pandas,matplotlib,seaborn,numpy
π§Ή Data Cleaning Process
Steps performed to prepare the dataset:
- Removed duplicate records to maintain data integrity.
- Identified and summarized missing values.
- Converted the
Release_Datecolumn to standarddatetimeformat. - Dropped irrelevant columns and standardized categorical values.
- Verified data types and ensured overall dataset consistency.
π Exploratory Data Analysis
Key exploratory queries and findings:
. Content Type Distribution: Number of Movies vs TV Shows on Netflix. . Release Trends: Year-wise release frequency β shows Netflixβs expansion. . Regional Analysis: Titles produced or released in 'India' and the 'UK'. . Top Directors: Identified the most prolific Netflix contributors. . Actor Spotlight: Movies featuring 'Tom Cruise'. . Genre & Rating Patterns: Distribution of categories and rating types.
Example visuals from the notebook include:
. Bar charts for 'content distribution and release years'. . Countplots showing 'category breakdowns' using Seaborn.
π Key Insights
. Movies make up the majority of Netflixβs catalog, though TV Shows are growing steadily. . The 'early 2020s' show a spike in global releases. . 'India' and the UK remain strong contributors of Netflix Originals. . 'Top directors' have produced a significant portion of Netflixβs total content. . The dataset reveals diverse 'rating systems' across countries.
π Repository Structure
π Netflix-EDA-Project/
β
βββ NetflixSC.ipynb # Main Jupyter Notebook
βββ data/
β βββ Netflixd.csv # Dataset file (optional upload)
βββ visuals/
β βββ readme_banner.PNG # Example chart or screenshot
βββ README.md # Project documentation
π§ Skills Demonstrated
. Data Cleaning (duplicates, missing data, data type conversions) . Exploratory Data Analysis (EDA) . Data Visualization using Matplotlib & Seaborn . Python and Jupyter Notebook workflow . Analytical storytelling and business insight derivation
π¬ Contact
Name: Onagadanalyst Role: Data & Market Analyst | Building Expertise in Augmented Analytics and AI Automation LinkedIn: GitHub: https://github.com/o-danalyst Email: onagatheanalyst@gmail.com