Skip to content
View KaustubhSN12's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report KaustubhSN12

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KaustubhSN12/README.md

kaustubhsn12

kaustubhsn12

kaustubh_sn

Hi πŸ‘‹, I'm Kaustubh

Data Science Student | Data analyst | Business Analyst | Machine Learning Engineer | Research Enthusiast

kaustubhsn12 LinkedIn Kaggle


πŸ‘¨β€πŸ’» About Me

Data Science Student at S.I.E.S College of Arts, Science and Commerce (Autonomous), Mumbai
Graduating: 2026

I'm a data science practitioner who builds end-to-end ML solutions β€” from exploratory analysis and model development to statistical validation and production deployment. My work spans time series forecasting, natural language processing, predictive analytics, and interactive data applications.

  • πŸ”¬ Research Areas: Deep Learning, Time Series, NLP, Sports Analytics
  • πŸ› οΈ Technical Focus: PyTorch, Feature Engineering, Model Optimization, Deployment
  • πŸ“Š Approach: Data-driven decision making with rigorous validation
  • 🎯 Passion: Solving real-world Business problems with AI and statistical methods
  • πŸ’Ό Status: Open to ML Engineering, Data Science, and Research opportunities

πŸš€ Featured Projects

πŸ“ˆ Air Quality Index Forecasting System

24-hour AQI predictions across 10 Indian cities using Temporal Fusion Transformer

  • Built synthetic dataset from 99.7% missing data using Gaussian Process Regression
  • Achieved 96.5% RMSE reduction through iterative model optimization (v1 β†’ v3)
  • Deployed interactive Streamlit dashboard with probabilistic forecasts (Q10/Q50/Q90)
  • Tech: PyTorch, TFT Architecture, Scikit-learn, Streamlit, Plotly
  • Results: RΒ² = 0.9584, Statistically validated (p < 0.05)

πŸ“‚ View Project | πŸ“Š Live Demo


πŸ›‘οΈ Phishing Email Detection System

Multi-model NLP architecture for email security using BERT and LSTM

  • Implemented dual-model approach combining contextual embeddings (BERT) and sequential patterns (LSTM)
  • Engineered features from email headers, body text, and metadata
  • Built robust preprocessing pipeline for handling diverse email formats
  • Tech: BERT, LSTM, PyTorch, NLP, Scikit-learn
  • Application: Cybersecurity, Email Filtering

πŸ“‚ View Project


⚽ FIFA World Cup 2022 Outcome Prediction

Sports analytics and match outcome forecasting using historical data

  • Analyzed historical FIFA World Cup data with statistical modeling
  • Developed predictive models for match outcomes and tournament progression
  • Feature engineering from team statistics, player performance, and historical matchups
  • Tech: Python, Scikit-learn, Pandas, Statistical Analysis
  • Domain: Sports Analytics, Predictive Modeling

πŸ“‚ View Project


πŸ’» Technical Skills

Languages & Core

Python R SQL JavaScript

Machine Learning & Deep Learning

  • Frameworks: PyTorch, TensorFlow, Scikit-learn, Keras
  • Architectures: Transformers (TFT, BERT), LSTM, CNN, Ensemble Methods
  • Techniques: Time Series Forecasting, NLP, Feature Engineering, Model Optimization
  • Specialization: Uncertainty Quantification, Multi-horizon Forecasting, Transfer Learning

Data Science Stack

  • Analysis: Pandas, NumPy, Statistical Methods, Hypothesis Testing
  • Visualization: Matplotlib, Plotly, Seaborn, Power BI
  • Preprocessing: Feature Engineering, Data Wrangling, Synthetic Data Generation
  • Statistical Tools: Gaussian Processes, Bootstrap Methods, A/B Testing

Deployment & Tools

Streamlit Git Linux Jupyter

Databases & Web

  • Databases: MySQL, Oracle, Advanced DBMS
  • Web Development: HTML, CSS, JavaScript, Bootstrap
  • Mobile: Android Studio, Java (Beginner)

πŸ“š Domain Knowledge

🧠 Machine Learning

  • Supervised & Unsupervised Learning
  • Deep Neural Networks
  • Model Selection & Validation
  • Hyperparameter Optimization

πŸ“Š Data Analytics

  • Exploratory Data Analysis
  • Statistical Inference
  • Data Visualization
  • Business Intelligence

πŸ” Specialized Skills

  • Time Series Analysis
  • Natural Language Processing
  • Computer Vision (OpenCV)
  • Research Methodology

πŸš€ Production ML

  • Model Deployment
  • Interactive Dashboards
  • Performance Monitoring
  • Version Control

πŸ“Š GitHub Statistics

GitHub Stats GitHub Streak

Top Languages


🌱 What I'm Learning

  • Advanced ML: Transformers (GPT, Vision Transformers), Graph Neural Networks
  • MLOps: Model Monitoring, CI/CD for ML, Experiment Tracking
  • Big Data: Distributed Computing, Spark, Scalable ML Systems
  • Specialized Topics: Reinforcement Learning, Federated Learning, AutoML

🎯 Research Interests

  • Time Series Forecasting: Multi-horizon predictions, Uncertainty quantification
  • Natural Language Processing: Transformers, Sentiment analysis, Text generation
  • Environmental AI: Climate modeling, Pollution forecasting, Sustainability applications
  • Sports Analytics: Predictive modeling, Performance optimization
  • Explainable AI: Model interpretability, Feature importance, Trust in ML

πŸ“« Connect With Me

I'm actively seeking opportunities where I can apply my data science and machine learning skills to solve impactful real-world problems.

Looking for:

  • Data Science / ML Engineering roles
  • Research collaborations
  • Open-source contributions
  • Kaggle competitions

linkedin twitter kaggle email

πŸ“§ Email: kaustubh.n007@gmail.com
πŸ“„ Resume: View Resume
πŸ‘¨β€πŸ’» Portfolio: All Projects


πŸ’‘ "Data is the new oil, but insights are the refined fuel" πŸ’‘

Trophies


visitor count

Pinned Loading

  1. Fifa-worldcup-Qatar-2022-prediction-system- Fifa-worldcup-Qatar-2022-prediction-system- Public

    An intelligent match outcome predictor for the FIFA World Cup 2022 using machine learning. This project analyzes historical team performance data, FIFA rankings, and key match statistics to forecas…

    Jupyter Notebook

  2. Power-BI_salary-gender-family-trends Power-BI_salary-gender-family-trends Public

    Power BI Salary, Gender, and Family Trends Explore comprehensive trends in salaries across different genders and family dynamics with this Power BI dashboard repository.

  3. PwC_PowerBi_Job-Simulation PwC_PowerBi_Job-Simulation Public

  4. Implement-Machine-Learning-with-Spark-or-Hadoop_BDA_LR Implement-Machine-Learning-with-Spark-or-Hadoop_BDA_LR Public

    Linear Regresion Machine Learning algorithm used with pySpark

    Jupyter Notebook

  5. Kmeans_Cluster_Exercise_ML Kmeans_Cluster_Exercise_ML Public

    Jupyter Notebook

  6. Principle-Component-Analysis-PCA-_Exercise_ML Principle-Component-Analysis-PCA-_Exercise_ML Public

    Exercise 10

    Jupyter Notebook