class AkashJha:
def __init__(self):
self.name = "Akash Jha"
self.role = "Data Scientist"
self.location = "Gurugram, HR"
self.experience = "1+ year"
self.current_company = "ASPL"
def get_skills(self):
return {
"AI/ML": ["LLMs", "RAG", "LangChain", "LangGraph", "PyTorch", "Scikit-Learn"],
"GenAI": ["Gemini", "OpenAI", "AI Agents", "Fine-tuning"],
"MLOps": ["Docker", "FastAPI", "AWS", "Model Monitoring", "CI/CD"],
"Data": ["Python", "SQL", "FAISS", "Pinecone", "ETL", "Big Data"],
"Viz": ["Power BI", "Tableau", "Streamlit", "Plotly"],
"Tools": ["Git", "Jupyter", "VS Code", "MLflow", "Airflow"]
}
def get_current_focus(self):
return [
"🤖 Building AI-powered prescription alerting systems",
"⚡ Optimizing real-time inference pipelines",
"📊 Implementing semantic search with vector databases",
"🔍 Reducing AI hallucinations with RAGAS evaluation"
]
def say_hi(self):
print("Thanks for dropping by! Let's build something amazing together 🚀")
me = AkashJha()
me.say_hi()High-throughput streaming pipeline processing millions of data points daily
🔧 Tech Stack:
- Streaming: Apache Kafka, Docker
- Processing: Python, Pandas, NumPy
- Storage: PostgreSQL, Redis
- Monitoring: Grafana, Prometheus
📊 Performance:
- 10M+ daily data points processed
- 40% efficiency improvement
- Real-time insights generation
- 99.9% uptime reliability
Comprehensive ETL pipeline analyzing 39M housing records with interactive dashboards
🔧 Tech Stack:
- Cloud: AWS (S3, Lambda, EC2)
- Data: Snowflake, Apache Airflow
- Visualization: Power BI, Python
- Processing: Pandas, SQL
📊 Scale:
- 39M housing records processed
- 10-year price trend analysis
- Interactive Power BI dashboard
- Automated daily updates
- 🚨 AI Prescription Alerting System - LightGBM + LLM semantic clustering (30% manual intervention reduction)
- 🤖 GenAI Customer Service Bot - LangChain + RAG automation (80% query automation, 92% accuracy)
- 💰 Credit Risk Scoring Engine - ML-based eligibility system (18% default rate reduction)
- 🔍 Semantic Search Pipeline - OCR + FAISS + Pinecone (31% hallucination reduction)
- ⚡ Real-time Data Pipeline - Kafka + Docker processing 10M+ daily streaming data points (40% efficiency gain)
- 🏠 Redfin Housing Analytics - AWS + Snowflake + Airflow ETL pipeline for 39M housing records
- 💰 Fraud Detection System - ML models on 7M+ transactions with 15% accuracy improvement
- 🎬 Movie Recommendation Engine - NLP-based content filtering and user preference matching
| 🎯 Metric | 📊 Achievement |
|---|---|
| Manual Interventions Reduced | 30% |
| Customer Query Automation | 80% |
| Model Precision Improvement | 38% |
| Hallucination Reduction | 31% |
| Default Rate Reduction | 18% |
| Processing Efficiency Gain | 40% |
timeline
title Career Timeline
3 Months : ML Intern
: Internship Studio
: Credit Scoring & Risk Models
: LightGBM + SMOTE Implementation
3 Months : Data Science Intern
: Zidio Development
: Build, Trained, and Integrated ML models.
: GenAI Chatbots & RAG Systems
2025-Present : Data Scientist
: ASPL
: Turning Data Into Decisions.
: Building, Training, and integrating ML models.
🎉 Built AI prescription system - Flagging high-risk transactions with 30% efficiency gain
🚀 Deployed GenAI chatbot - Automated 80% of customer queries with 92% accuracy
⚡ Optimized real-time alerts - Sub-300ms latency with FastAPI integration
📊 Reduced AI hallucinations - 31% improvement using RAGAS evaluation framework
- 🤖 Building AI systems that actually solve real problems
- ⚡ Passionate about making AI faster and more reliable
- 📊 Love turning messy data into actionable insights
- 🔧 Always tinkering with new ML frameworks and tools
- 📚 Continuous learner - currently exploring multimodal AI
- 🎯 Goal: Make AI accessible and beneficial for everyone
💬 Open to discussing:
- AI/ML project collaborations
- GenAI and LLM implementations
- Data science consulting opportunities
- Open source contributions
- Tech talks and knowledge sharing
📧 Reach out: iamakashjha@icloud.com
🌐 Portfolio: Portfolio
⭐ If you find my work interesting, give my repositories a star!
