Applied AI Engineer | ML Platform | Agentic Workflows | LLM Pretraining, Fine-Tuning, RLHF | MLOps
I build production-grade ML and AI systems across workflow orchestration, RAG and agent runtimes, LLM pretraining and post-training, release governance, and Kubernetes-native platform operations.
My work is centered on:
- Agentic workflow platforms with typed execution contracts, tools, capabilities, memory, and triggers
- LLM and RAG systems with evaluation, observability, and operational guardrails
- LLM training and adaptation workflows including pretraining, supervised fine-tuning, and RLHF-style preference optimization
- ML platform and release pipelines with measurable quality gates and controlled promotion
- Local-first developer workflows that map cleanly to cloud and Kubernetes production patterns
All public repositories use educational, synthetic, or non-sensitive data only.
- AI workflow platforms and orchestration runtimes
- Tool-using agent and retrieval-augmented systems
- LLM pretraining, fine-tuning, and RLHF-style alignment workflows
- ML and LLM release governance pipelines
- Observability-first platform services
A full-stack platform for authoring and running AI-powered workflows through chat and a visual DAG editor, backed by typed execution contracts, reusable capabilities, memory, triggers, and Kubernetes-native orchestration.
Highlights
- Chat, Compose, and Workflow Studio surfaces for conversational, goal-driven, and manually authored workflows
- Typed planner/worker runtime with reusable capabilities, memory integration, control-flow nodes, retries, and DLQ recovery
- Kubernetes-ready scaling, artifact/document handling, and observability with Prometheus, Grafana, Loki, and Jaeger
Repo: https://github.com/narendersurabhi/agentic-workflow-studio
A Kubernetes-first release governance project focused on controlled promotion, plan/execute workflows, policy gating, and observable ML operations.
Highlights
- Release-gate workflow for evaluation, approval, and promotion
- MLflow-backed artifacts and runtime decisioning
- Prometheus and OpenTelemetry instrumentation with Jaeger tracing
Repo: https://github.com/narendersurabhi/mlops-release-gate-agent
A production-style MCP server with multiple transports, auth policy controls, and observability built in from the start.
Highlights
- Stdio and HTTP MCP transport support
- Scope-based bearer-token authorization model
- OpenTelemetry and Prometheus instrumentation for runtime visibility
Repo: https://github.com/narendersurabhi/mcp-control-plane
An end-to-end repository for LLM customization workflows spanning fine-tuning, RLHF-style preference optimization, evaluation, and serving.
Highlights
- LoRA and QLoRA supervised fine-tuning flows
- DPO-style preference optimization and evaluation gates
- Production-facing API and observability patterns for model serving
Repo: https://github.com/narendersurabhi/llm-customization-ops
A reference implementation for model governance and promotion across training, evaluation, registry flow, and serving.
Highlights
- Train, evaluate, promote, and serve workflow
- Artifact and model-registry oriented release path
- FastAPI serving with Prometheus and Grafana-compatible metrics
Repo: https://github.com/narendersurabhi/ml-platform-release-gates
- Contracts-first design for predictable component boundaries
- LLM lifecycle coverage from training and post-training through evaluation and serving
- Evidence-based release decisions using measurable criteria
- Operability as a hard requirement, not a follow-on task
- Reproducible local and CI workflows with deterministic test paths
- Platform and API: Python, FastAPI, Docker, Kubernetes, Helm, MLflow
- Observability: OpenTelemetry, Prometheus, Grafana, Jaeger, Loki
- LLM systems: pretraining concepts, supervised fine-tuning, RLHF and preference optimization, RAG, tool-calling agents, workflow orchestration, evaluation pipelines
- ML and data: PyTorch, XGBoost, PySpark, classical ML pipelines
- Applied AI Engineer
- AI Platform Engineer
- ML Platform Engineer
- LinkedIn: https://www.linkedin.com/in/narendersurabhi
- GitHub: https://github.com/narendersurabhi
- Location: Okemos, MI


