Data Scientist | Business Analytics Graduate | Building ML Systems That Solve Real Problems
I turn messy, real-world data into production-ready machine learning systems. My edge? A business background that helps me translate technical solutions into stakeholder value—not just optimize metrics.
🎓 Master's in Business Analytics @ Sunway University (Graduating January 2026)
🌏 Seeking roles in: Malaysia | Singapore | Vietnam
🚀 Available: January 2026
I didn't start in computer science—I came from International Business, taught myself data analytics in 2021, and pursued a Master's in Business Analytics. That unconventional path means I don't just build models—I solve problems that matter to stakeholders and communicate insights people can actually use.
🏠 Property Price Prediction System – 97% accurate ML model for Malaysia's Klang Valley using geospatial features and DBSCAN clustering
☁️ AWS Cloud Certifications – Expanding MLOps capabilities for scalable model deployment
📱 Building in Public – Sharing my data science journey on LinkedIn
The Challenge: Property valuations in Malaysia take days of manual research and cost RM 400-2,000+ per property.
My Solution: Built an end-to-end ML system that predicts prices in under 5 minutes with 97% accuracy (R² = 0.97). The breakthrough wasn't just the algorithm—it was solving a data quality nightmare.
Key Innovation:
Consolidated 18,000+ inconsistent location labels (misspelled road names, duplicate schemes, manual entry errors) into 238 spatial market segments using DBSCAN clustering. This single feature engineering step improved model accuracy from 84% to 97%.
Tech Stack: Python • scikit-learn • DBSCAN • Random Forest • OpenStreetMap • Geospatial Analysis • pandas
Business Impact:
✅ Reduces valuation time from days → minutes (99% faster)
✅ Maintains 97% accuracy on unseen 2025 data (temporal validation)
✅ Production-ready Python package with 50+ unit tests
✅ Potential cost savings: RM 150,000/month for high-volume agencies
What I Learned: Feature engineering > hyperparameter tuning. I achieved 97% with default Random Forest parameters—proving that smart data preparation matters more than complex algorithms.
📂 View Full Project | 📊 Technical Deep Dive
The Problem: Brands need to understand how they're perceived on social media, but manual analysis doesn't scale.
My Solution: Built an NLP pipeline that processes 10,000+ social media posts to extract brand perception insights and competitive positioning.
Tech Stack: Python • NLP • Sentiment Analysis • pandas • Text Processing
Business Value:
✅ Automated sentiment tracking across platforms
✅ Comparative brand analysis (Uniqlo vs Muji positioning)
✅ Network analysis revealing influencer patterns
The Problem: Logistics companies need efficient routing to minimize delivery time and fuel costs.
My Solution: Implemented a genetic algorithm solution for the Traveling Salesman Problem, optimizing delivery routes across 150+ locations in Subang, Malaysia.
Tech Stack: Python • Genetic Algorithms • Optimization • Evolutionary Computing
Impact:
✅ Reduces total route distance by 20-30%
✅ Scalable to real-world logistics scenarios
✅ Demonstrates algorithmic problem-solving
Core Skills:
Python • SQL • Machine Learning • Statistical Analysis • Data Visualization
ML & Data Science:
scikit-learn • pandas • NumPy • TensorFlow • DBSCAN • Random Forest • Feature Engineering
Visualization & BI:
Tableau • Power BI • Matplotlib • Seaborn • Plotly
Cloud & DevOps:
AWS (learning) • Git • Jupyter • VS Code • Google Colab
Domain Expertise:
Geospatial Analysis • NLP • Sentiment Analysis • Optimization Algorithms • Time Series Analysis
✅ End-to-end ML execution – From messy data to production-ready models
✅ Business acumen – I understand stakeholder needs and translate technical insights into action
✅ Communication skills – I explain complex concepts to non-technical audiences (proven through LinkedIn content)
✅ Production mindset – I write clean, tested, documented code (see my 50+ unit tests)
✅ Continuous learning – Currently expanding into AWS/MLOps to enhance deployment capabilities
💼 LinkedIn: linkedin.com/in/phan-đức-duy-anh
📧 Email: duyanh.phanduc@gmail.com
🌐 GitHub: github.com/anhpdd
Currently seeking: Data Scientist | Business Intelligence Analyst roles
Available: January 2026
Locations: Malaysia | Singapore | Vietnam
Work Authorization: Graduate Pass sponsorship required
📱 Building in Public: Sharing my data science journey on LinkedIn with 3x weekly posts about ML, career lessons, and technical deep-dives
🎓 Academic Recognition: Capstone project supervised by Dr. Norman Arshed & Dr. Mubbasher Munir (Sunway University)
🌱 Current Learning: AWS Cloud Practitioner certification, LLM integration with Gemini API, MLOps best practices
- 🌏 Originally from International Business → self-taught analytics → Master's in Business Analytics
- 📚 Started learning data science on DataCamp in 2021
- 🗺️ Fascinated by geospatial analytics and how location data shapes decisions
- ☕ Best ideas come at 2 AM during debugging sessions
"Data science isn't just about algorithms—it's about solving real problems end-to-end."
⭐ If you found my work interesting, consider giving my repos a star!
💼 Open to collaboration, mentorship, and full-time opportunities starting January 2026.
