Skip to content
View narendersurabhi's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report narendersurabhi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
narendersurabhi/README.md

Narender Rao Surabhi

Applied AI Engineer | ML Platform | Agentic Workflows | LLM Pretraining, Fine-Tuning, RLHF | MLOps

I build production-grade ML and AI systems across workflow orchestration, RAG and agent runtimes, LLM pretraining and post-training, release governance, and Kubernetes-native platform operations.

My work is centered on:

  • Agentic workflow platforms with typed execution contracts, tools, capabilities, memory, and triggers
  • LLM and RAG systems with evaluation, observability, and operational guardrails
  • LLM training and adaptation workflows including pretraining, supervised fine-tuning, and RLHF-style preference optimization
  • ML platform and release pipelines with measurable quality gates and controlled promotion
  • Local-first developer workflows that map cleanly to cloud and Kubernetes production patterns

All public repositories use educational, synthetic, or non-sensitive data only.

What I build

  • AI workflow platforms and orchestration runtimes
  • Tool-using agent and retrieval-augmented systems
  • LLM pretraining, fine-tuning, and RLHF-style alignment workflows
  • ML and LLM release governance pipelines
  • Observability-first platform services

Featured Projects

1) Agentic Workflow Studio

A full-stack platform for authoring and running AI-powered workflows through chat and a visual DAG editor, backed by typed execution contracts, reusable capabilities, memory, triggers, and Kubernetes-native orchestration.

Highlights

  • Chat, Compose, and Workflow Studio surfaces for conversational, goal-driven, and manually authored workflows
  • Typed planner/worker runtime with reusable capabilities, memory integration, control-flow nodes, retries, and DLQ recovery
  • Kubernetes-ready scaling, artifact/document handling, and observability with Prometheus, Grafana, Loki, and Jaeger

Repo: https://github.com/narendersurabhi/agentic-workflow-studio

2) MLOps Release Gate Agent

A Kubernetes-first release governance project focused on controlled promotion, plan/execute workflows, policy gating, and observable ML operations.

Highlights

  • Release-gate workflow for evaluation, approval, and promotion
  • MLflow-backed artifacts and runtime decisioning
  • Prometheus and OpenTelemetry instrumentation with Jaeger tracing

Repo: https://github.com/narendersurabhi/mlops-release-gate-agent

3) MCP Control Plane

A production-style MCP server with multiple transports, auth policy controls, and observability built in from the start.

Highlights

  • Stdio and HTTP MCP transport support
  • Scope-based bearer-token authorization model
  • OpenTelemetry and Prometheus instrumentation for runtime visibility

Repo: https://github.com/narendersurabhi/mcp-control-plane

4) LLM Customization Ops

An end-to-end repository for LLM customization workflows spanning fine-tuning, RLHF-style preference optimization, evaluation, and serving.

Highlights

  • LoRA and QLoRA supervised fine-tuning flows
  • DPO-style preference optimization and evaluation gates
  • Production-facing API and observability patterns for model serving

Repo: https://github.com/narendersurabhi/llm-customization-ops

5) ML Platform Release Gates

A reference implementation for model governance and promotion across training, evaluation, registry flow, and serving.

Highlights

  • Train, evaluate, promote, and serve workflow
  • Artifact and model-registry oriented release path
  • FastAPI serving with Prometheus and Grafana-compatible metrics

Repo: https://github.com/narendersurabhi/ml-platform-release-gates

Selected Strengths

  • Contracts-first design for predictable component boundaries
  • LLM lifecycle coverage from training and post-training through evaluation and serving
  • Evidence-based release decisions using measurable criteria
  • Operability as a hard requirement, not a follow-on task
  • Reproducible local and CI workflows with deterministic test paths

Core Stack

  • Platform and API: Python, FastAPI, Docker, Kubernetes, Helm, MLflow
  • Observability: OpenTelemetry, Prometheus, Grafana, Jaeger, Loki
  • LLM systems: pretraining concepts, supervised fine-tuning, RLHF and preference optimization, RAG, tool-calling agents, workflow orchestration, evaluation pipelines
  • ML and data: PyTorch, XGBoost, PySpark, classical ML pipelines

Open To

  • Applied AI Engineer
  • AI Platform Engineer
  • ML Platform Engineer

Connect

Pinned Loading

  1. agentic-workflow-studio agentic-workflow-studio Public

    Agentic Workflow Studio

    Python

  2. ml-platform-release-gates ml-platform-release-gates Public

    Reference system for model governance: evaluation gates → promotion workflow → serving API → Prometheus/Grafana observability.

    Python

  3. mcp-control-plane mcp-control-plane Public

    MCP control plane server for agent tooling: stdio+HTTP transport, auth/scopes, and deployment-ready telemetry + packaging.

    Python

  4. langchain-prod-starter langchain-prod-starter Public

    Production-ready LangChain + FastAPI starter: testable chains, RAG, tool-using agent demo, and ops-friendly endpoints.

    Python

  5. llm-customization-ops llm-customization-ops Public

    End-to-end LLM customization ops: LoRA/QLoRA SFT + DPO, eval gates, and a service layer with telemetry hooks.

    Python

  6. mlops-starter-aws mlops-starter-aws Public

    AWS MLOps starter template: train/test → containerized FastAPI on Lambda + a basic drift-monitor scaffold (CDK).

    Makefile