Skip to content
View anhpdd's full-sized avatar

Highlights

  • Pro

Block or report anhpdd

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
anhpdd/README.md

Hi, I'm Robin (Duy Anh) 👋

Data Scientist | Business Analytics Graduate | Building ML Systems That Solve Real Problems

I turn messy, real-world data into production-ready machine learning systems. My edge? A business background that helps me translate technical solutions into stakeholder value—not just optimize metrics.

🎓 Master's in Business Analytics @ Sunway University (Graduating January 2026)
🌏 Seeking roles in: Malaysia | Singapore | Vietnam
🚀 Available: January 2026


💡 What Makes Me Different

I didn't start in computer science—I came from International Business, taught myself data analytics in 2021, and pursued a Master's in Business Analytics. That unconventional path means I don't just build models—I solve problems that matter to stakeholders and communicate insights people can actually use.


🚀 What I'm Working On

🏠 Property Price Prediction System – 97% accurate ML model for Malaysia's Klang Valley using geospatial features and DBSCAN clustering
☁️ AWS Cloud Certifications – Expanding MLOps capabilities for scalable model deployment
📱 Building in Public – Sharing my data science journey on LinkedIn


💼 Featured Projects

The Challenge: Property valuations in Malaysia take days of manual research and cost RM 400-2,000+ per property.

My Solution: Built an end-to-end ML system that predicts prices in under 5 minutes with 97% accuracy (R² = 0.97). The breakthrough wasn't just the algorithm—it was solving a data quality nightmare.

Key Innovation:
Consolidated 18,000+ inconsistent location labels (misspelled road names, duplicate schemes, manual entry errors) into 238 spatial market segments using DBSCAN clustering. This single feature engineering step improved model accuracy from 84% to 97%.

Tech Stack: Python • scikit-learn • DBSCAN • Random Forest • OpenStreetMap • Geospatial Analysis • pandas

Business Impact:
✅ Reduces valuation time from days → minutes (99% faster)
✅ Maintains 97% accuracy on unseen 2025 data (temporal validation)
✅ Production-ready Python package with 50+ unit tests
✅ Potential cost savings: RM 150,000/month for high-volume agencies

What I Learned: Feature engineering > hyperparameter tuning. I achieved 97% with default Random Forest parameters—proving that smart data preparation matters more than complex algorithms.

📂 View Full Project | 📊 Technical Deep Dive


The Problem: Brands need to understand how they're perceived on social media, but manual analysis doesn't scale.

My Solution: Built an NLP pipeline that processes 10,000+ social media posts to extract brand perception insights and competitive positioning.

Tech Stack: Python • NLP • Sentiment Analysis • pandas • Text Processing

Business Value:
✅ Automated sentiment tracking across platforms
✅ Comparative brand analysis (Uniqlo vs Muji positioning)
✅ Network analysis revealing influencer patterns

📂 View Project


The Problem: Logistics companies need efficient routing to minimize delivery time and fuel costs.

My Solution: Implemented a genetic algorithm solution for the Traveling Salesman Problem, optimizing delivery routes across 150+ locations in Subang, Malaysia.

Tech Stack: Python • Genetic Algorithms • Optimization • Evolutionary Computing

Impact:
✅ Reduces total route distance by 20-30%
✅ Scalable to real-world logistics scenarios
✅ Demonstrates algorithmic problem-solving

📂 View Project


🛠️ Tech Stack

Core Skills:
Python • SQL • Machine Learning • Statistical Analysis • Data Visualization

ML & Data Science:
scikit-learn • pandas • NumPy • TensorFlow • DBSCAN • Random Forest • Feature Engineering

Visualization & BI:
Tableau • Power BI • Matplotlib • Seaborn • Plotly

Cloud & DevOps:
AWS (learning) • Git • Jupyter • VS Code • Google Colab

Domain Expertise:
Geospatial Analysis • NLP • Sentiment Analysis • Optimization Algorithms • Time Series Analysis


📊 GitHub Activity

Anh's GitHub Stats

Top Languages


🎯 What I Bring to Your Team

End-to-end ML execution – From messy data to production-ready models
Business acumen – I understand stakeholder needs and translate technical insights into action
Communication skills – I explain complex concepts to non-technical audiences (proven through LinkedIn content)
Production mindset – I write clean, tested, documented code (see my 50+ unit tests)
Continuous learning – Currently expanding into AWS/MLOps to enhance deployment capabilities


📫 Let's Connect

💼 LinkedIn: linkedin.com/in/phan-đức-duy-anh
📧 Email: duyanh.phanduc@gmail.com
🌐 GitHub: github.com/anhpdd

Currently seeking: Data Scientist | Business Intelligence Analyst roles
Available: January 2026
Locations: Malaysia | Singapore | Vietnam
Work Authorization: Graduate Pass sponsorship required


💬 Recent Highlights

📱 Building in Public: Sharing my data science journey on LinkedIn with 3x weekly posts about ML, career lessons, and technical deep-dives

🎓 Academic Recognition: Capstone project supervised by Dr. Norman Arshed & Dr. Mubbasher Munir (Sunway University)

🌱 Current Learning: AWS Cloud Practitioner certification, LLM integration with Gemini API, MLOps best practices


🔥 Fun Facts

  • 🌏 Originally from International Business → self-taught analytics → Master's in Business Analytics
  • 📚 Started learning data science on DataCamp in 2021
  • 🗺️ Fascinated by geospatial analytics and how location data shapes decisions
  • ☕ Best ideas come at 2 AM during debugging sessions

"Data science isn't just about algorithms—it's about solving real problems end-to-end."


⭐ If you found my work interesting, consider giving my repos a star!

💼 Open to collaboration, mentorship, and full-time opportunities starting January 2026.

Pinned Loading

  1. ml-property-klang-valley ml-property-klang-valley Public

    Machine Learning system for automated property price prediction in Malaysia's Klang Valley. Achieves 97% accuracy using geospatial features, DBSCAN clustering, and Random Forest modeling. Built wit…

    Jupyter Notebook 1