I’m an AI / ML-focused Software Engineer who builds production-grade AI systems end-to-end — from model training and inference to mobile apps, APIs, and cloud deployment.
- Senior at Bennington College studying Computer Science & Mathematics
- Focus: Applied Machine Learning, AI Infrastructure, and Full-Stack Software Engineering
- Strong interest in LLM systems, multimodal AI, mobile-first products, and scalable AI platforms
- Designing LLM-powered systems (RAG, agents, embeddings)
- Building mobile & web apps backed by AI services
- Shipping real-time inference APIs (speech, vision, NLP)
- Optimizing ML systems for latency, cost, and scale
- Deploying to serverless, containerized, and GPU-backed infra
- PyTorch, TensorFlow, Keras
- Hugging Face (Transformers, Datasets, Accelerate)
- Classical ML (scikit-learn, XGBoost)
- Computer Vision (OpenCV)
- Speech & Audio ML (Whisper, speaker diarization, embeddings)
- Model optimization: quantization, mixed precision, batching, GPU/CPU tradeoffs
- Retrieval-Augmented Generation (RAG)
- Embeddings & semantic search
- Prompt engineering & evaluation
- Tool-calling and agent workflows
- Context window + latency optimization
Vector Databases
- Pinecone
- Weaviate
- FAISS
- PostgreSQL + pgvector
- Built mobile apps using Expo + React Native with native iOS integrations
- Designed AI-backed mobile flows (voice, chat, search)
- State management, API integration, auth, and performance tuning
- Deployed mobile + web apps with shared TypeScript codebases
- API design for ML & mobile clients
- Real-time streaming inference (audio/text)
- Auth, rate limiting, and observability
- Type-safe backend development
- Experiment tracking & reproducibility
- Model versioning & evaluation pipelines
- Dataset curation, bias analysis, and monitoring
- Relational + NoSQL schema design
- Caching & low-latency reads
- Vector search + hybrid retrieval
- Serverless + edge deployment (Vercel)
- GPU-backed inference services
- CI/CD for ML and app code
- Infrastructure as Code
Production-ready voice AI system
- Built a unified speech platform combining:
- Whisper ASR
- Speaker diarization
- Speaker embedding–based verification
- Designed for real-time and batch inference with streaming support
- Deployed via containerized GPU services with autoscaling
- Reduced inference cost using quantization and mixed precision
- Exposed secure REST APIs consumed by web and mobile clients
Equitable AI for Dermatology
- Built models performing fairly across Fitzpatrick skin types I–VI
- Implemented group-aware sampling and reweighted losses
- Applied calibration, test-time augmentation, and ensembling
- Ranked 2nd out of 300+ teams worldwide
- 💼 LinkedIn: https://linkedin.com/in/malikrohail
- 📧 Email: malikrohail525@gmail.com


