Skip to content

This repository contains various Python projects that I worked on while learning data analysis and modeling. These projects cover different topics, including exploratory data analysis and prediction modeling. Through these projects, I gained hands-on experience, exploring datasets, visualizing data, and building predictive models.

Notifications You must be signed in to change notification settings

Balasubramanian-pg/Python-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Project Ideas: Mastering EDA, Web Scraping, and Data Analysis

Welcome to this collection of Python project ideas designed to help you build practical skills in Exploratory Data Analysis (EDA), Web Scraping, and Data Visualization. This repository is packed with hands-on project ideas that will guide you through real-world data manipulation and analysis tasks. Whether you're working with raw datasets, scraping data from websites, or building interactive visualizations, these projects will provide valuable insights into how data can be used to uncover trends, solve problems, and inform decisions.

Project Highlights

  1. Web Scraping: Learn how to collect and analyze data from websites using Python tools like BeautifulSoup, Scrapy, and Selenium. Discover how to clean, extract, and store web data for further analysis.
  2. Exploratory Data Analysis (EDA): Gain hands-on experience with pandas, NumPy, and Matplotlib to explore datasets, clean data, and uncover patterns through visualizations.
  3. Data Visualization: Create interactive dashboards and visualizations using Matplotlib, Seaborn, and Plotly. Understand how to present data insights clearly and effectively.
  4. Data Modeling: Explore machine learning models to predict trends and forecast future outcomes based on your data.

Project Descriptions

This document serves as a central directory for a diverse collection of Python projects. The portfolio spans from fundamental data analysis and visualization to machine learning, data scraping, and automation scripts.


Advanced EDA

image

A project demonstrating advanced techniques in exploratory data analysis and data preprocessing.

Airbnb Analysis

image

An analysis of New York City Airbnb data to uncover trends and insights.

Apple Watch Data Analysis

image

Data processing and predictive modeling using health data from Apple Watch and Fitbit devices.

Asynchronous Dependency

image

Documentation outlining a task related to handling asynchronous dependencies.

Automatic Reporting

image

A project focused on generating automated reports from data.

Banking Crisis Tweets Analysis

image

A dataset containing tweets related to a banking crisis, ready for sentiment analysis or topic modeling.

Campaign Analysis

image

An exploratory data analysis of a marketing campaign to measure its effectiveness.

Cheat Sheets

A collection of quick-reference cheat sheets for Python and its core data science libraries.

Classification & Clustering

image

A project demonstrating classification and clustering machine learning techniques.

Clustered Exploratory Analysis

image

An analysis of store sales data using clustering methods.

Collapsed Project Structure

image

Documentation for a project, detailing its approach and structure.

Companies Information Dataset

A dataset containing information about various companies.

Computer Stocks Analysis

image

An analysis project focused on computer company stocks.

Convert Images to String (OCR)

Automation scripts for converting image files into other formats like strings (OCR) or PDFs.

Data Scraper

A collection of web scraping tools, including scripts for IMDB and LinkedIn.

Dataset (Stock top performing 5 sectors)

image

A collection of datasets for top-performing stocks across five key sectors.

Email Address Scraper

A tool for scraping email addresses, with a specific implementation for Gmail.

Erling Haaland Performance Analysis

image

An analysis of footballer Erling Haaland's performance data.

Exploratory Data Analysis & Hypothesis Testing

A collection of notebooks demonstrating EDA and hypothesis testing.

Feature Engineering

A guide to feature engineering techniques in Python.

Fibonacci Calculation

A project demonstrating the calculation of Fibonacci numbers.

Forbes Billionaires Analysis

An in-depth analysis of the Forbes billionaires list using Seaborn for visualization.

Genome Prediction

A dataset for use in genomic prediction tasks.

Home Loan Approval Prediction

A project to predict home loan approval using XGBoost.

House Price Listing Analysis

A comprehensive exploratory data analysis of house price listings.

Indian Population Analysis

An analysis project focused on the population of India.

Internship Fittlyf

Project materials from an internship with Fittlyf.

Intro to Python

Reference guides covering the fundamentals of the Python language.

K-Means Clustering

An example of K-Means clustering applied to the Iris dataset.

Loan Default Prediction

image

A machine learning project to predict loan defaults.

Market Basket Analysis

A project using association rules to perform market basket analysis.

Marketing Analytics With Meta

Resources and assessments for marketing analytics, focusing on Meta (Facebook).

Metacritic Rating Analysis

An analysis of video game ratings data from Metacritic.

Missing State Identifier

A notebook designed to identify missing state information in a dataset.

Nifty 50 Analysis

image

A folder for a project analyzing India's Nifty 50 stock market index.

Obesity Classification Dataset

A dataset for classifying obesity levels.

OCR Table Reader

A project that uses Optical Character Recognition (OCR) to read tables from documents.

PDF To Excel Converter

Scripts for extracting data from PDF files and converting it into Excel format using the Tabula library.

Practice Datasets

A collection of various datasets for practice and exploration.

Premier League Analysis

Data and analysis related to the English Premier League.

Red Dead Redemption 2 Analysis

An analysis project for the game Red Dead Redemption 2, including a Power BI file.

Sensitivity Analysis for Mini Project

A sensitivity analysis performed for a mini-project, complete with context and data.

Sentiment Analysis (ML Master Project)

A complete machine learning project for sentiment analysis, including preprocessing, feature extraction, training, and a server component.

Sentiment Analysis (Project Specs & Resources)

Documentation, notebooks, and resources for a sentiment analysis project, covering supervised, unsupervised, and text normalization methods.

Skincare Review NLP

A dataset of Ulta skincare reviews for Natural Language Processing (NLP) tasks.

Spotify Data Analysis & Scraping

Projects related to scraping and analyzing Spotify data.

Step Analysis

Notebooks demonstrating anomaly detection and data extraction from PDFs.

Story Generator

A Python script that programmatically generates stories.

Top Mutual Funds Dataset

image

A comprehensive dataset of top-performing mutual funds.

Traffic Volume & Weather Prediction

A project to analyze and predict traffic volume based on weather conditions.

UK Consumer Trends (1997-2022)

image

An analysis of consumer spending trends in the UK over 25 years.

US Tornado Database (1950-2021)

image

A historical dataset of tornado occurrences in the United States.

Universities Dataset

image

A dataset listing top universities for Computer Science.

Wine Quality (Red) Analysis

image

An analysis project focused on the Red Wine Quality dataset.

Youtube Data Scraping & Analysis

Projects focused on scraping data from YouTube and performing exploratory analysis.


Tools and Libraries

To complete these projects, you’ll need several powerful tools and libraries:

  • Web Scraping: BeautifulSoup, Scrapy, Selenium, Requests
  • Data Manipulation & Analysis: pandas, NumPy
  • Data Visualization: Matplotlib, Seaborn, Plotly
  • Machine Learning: Scikit-learn, TensorFlow, Keras (for predictive modeling)
  • NLP: NLTK, SpaCy, TextBlob (for sentiment analysis and text classification)
  • APIs: Tweepy (Twitter API), OpenWeatherMap (weather data), Yahoo Finance (stock data)

About

This repository contains various Python projects that I worked on while learning data analysis and modeling. These projects cover different topics, including exploratory data analysis and prediction modeling. Through these projects, I gained hands-on experience, exploring datasets, visualizing data, and building predictive models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published