Skip to content

raghavimoningi/IntroToPython

Repository files navigation

🎬 MovieLens 100K Data Analysis with Python

Python Jupyter License Made with Pandas Visualization

A Python-based project for performing EDA, visualization, and insight generation on the MovieLens 100K dataset, exploring user ratings, demographics, and genre trends.


❓ Problem Statement

This project is built using the MovieLens 100K dataset from the GroupLens Research Project at the University of Minnesota. It is widely used for collaborative filtering and recommendation systems, but here, the goal is to demonstrate Python proficiency in data handling, analysis, and visualization.

Dataset:

  • Downloaded from MovieLens 100K
  • Files used:
    • u.data: user ratings
    • u.item: movie metadata
    • u.user: user demographics

Objectives:

  • Perform exploratory data analysis (EDA)
  • Create univariate plots for: rating, age, release date, gender, occupation
  • Visualize how popularity of genres has changed over the years
  • Display Top 25 movies by average rating (only movies with ≥ 100 ratings)
  • Verify:
    • Do men watch more Drama and Romance than women?
    • Do women watch more Sci-Fi than men?

This hands-on project demonstrates the use of Python libraries such as Pandas, NumPy, Matplotlib, and Seaborn to draw actionable insights from entertainment data.


📌 Project Overview

This project showcases key data analysis skills using Python by working with the MovieLens 100K dataset. It covers:

  • Data loading and wrangling from multiple sources
  • Exploratory Data Analysis (EDA)
  • Univariate and bivariate visualizations
  • Insight extraction on genre popularity and user behavior

🧠 Learning Objectives

  • Work with real-world structured datasets
  • Practice data cleaning and preprocessing
  • Build visualizations using Matplotlib and Seaborn
  • Draw business-relevant conclusions from user activity and preferences

📊 Project Tasks

✅ Load and merge data from:

  • u.data – Movie ratings
  • u.item – Movie metadata
  • u.user – User demographics

✅ Analyze and visualize:

  • Distribution of ratings, age, gender, and occupations
  • Genre popularity trends by year
  • Top 25 highest-rated movies with at least 100 reviews
  • Gender-based genre viewing preferences (Drama, Romance, Sci-Fi)

✅ Techniques used:

  • Pandas for manipulation
  • NumPy for numerical operations
  • Regex for string parsing
  • GroupBy, Merge, and Filtering
  • Histograms, Bar Plots, Line Plots, Heatmaps

🧰 Tech Stack

Category Tools
Language Python 3.8+
IDE/Notebook Jupyter Notebook
Libraries Pandas, NumPy, Matplotlib, Seaborn, Regex
Dataset MovieLens 100K
File Formats .data, .item, .user, .ipynb

📁 Folder Structure

.
├── Intro to python- project.ipynb     # Main Jupyter Notebook
├── ml-100k/                           # MovieLens dataset folder
│   ├── u.data
│   ├── u.item
│   └── u.user
├── README.md                          # Project documentation

📷 Example Visuals

📈 Genre popularity over years
🧑‍💼 User demographic breakdown
🎥 Top 25 movies by average rating (≥100 reviews)


📌 References


👤 Author

Your Name
Python Data Enthusiast | Data Science Learner
📧 moningi.raghavi@gmail.com
🔗 GitHubLinkedIn


📝 License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors