Skip to content
View jayshah5696's full-sized avatar
🚀
in/acc
🚀
in/acc

Organizations

@6si @mistralai-sf24

Block or report jayshah5696

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jayshah5696/README.md

Hi, I'm Jay Shah 👋

Jay Shah

Senior Data Scientist  |  LLM/RAG Systems  |  Agent Evaluation  |  B2B Intent Intelligence

🌐 jayshah.dev  |  💼 LinkedIn  |  📍 San Francisco, CA


About

Senior Data Scientist at 6sense working on the Autonomous Revenue Engine: agent evaluation, Agentic RAG pipelines, and B2B intent intelligence. Previously at Avathon (formerly SparkCognition), where I led foundation-model MLOps, built an MCP-powered agent platform running 20+ workflows, and shipped RAG/anomaly detection systems for energy domains.

MS in Industrial & Systems Engineering (Applied Statistics) from Texas A&M. BE in Mechanical Engineering from GTU. DataKind Ambassador.


What I'm working on

Building production agentic systems on self-hosted hardware and writing about what actually works.

I write about real ML systems at jayshah.dev. Recent posts:

Date Post
Mar 2026 Can Dense Retrieval Beat BM25 for Entity Resolution? (And At What Cost?)
Feb 2026 File-Based Memory Is a Terrible Idea That Somehow Works
Feb 2026 Stop Renting Your Workflow: Building a Custom AI Coding Agent with Pi
Sep 2024 Concept to Code - Deploying ColBERT with RAGatouille on Modal Labs in Minutes
Sep 2024 Beyond the Hype - Practical Strategies for Implementing Superior RAG

Featured Projects

Project Stars Description
Medha RAG-based search and reasoning engine
Entity Resolution POC Autonomous entity resolution and triplet extraction
pravah ⭐ 30 LLM-powered local search engine
AutomaticValuationModel ⭐ 38 Real estate AVM using ML on comparable properties
session-aggregator Unified search + export across AI coding session history
pi-agent-extensions Extensions for Pi Coding Agent: sessions, ask_user, handoff
Power_Curve_Estimation ⭐ 5 Wind turbine power curve modeling

Highlights

  • 🏆 RAG-a-thon 2024 winner (StreamLens)
  • 🏆 LlamaIndex Ragathon 2024 winner
  • 📜 Patents: energy production forecasting (US) and industrial defect detection (IN)
  • 🤝 DataKind Ambassador: student dropout prediction models
  • 💡 Fine-tuned Llama-2 for Gujarati language (Gujarati Llama)

GitHub Stats

GitHub streak

Jay's GitHub activity

Repos by language Most used languages


Built with 🍵 tea  |  Updated Mar 2026

Pinned Loading

  1. AutomaticValuationModel AutomaticValuationModel Public

    Automated valuation model (AVM) is the name given to a service that can provide real estate property valuations using mathematical modelling combined with a database. Most AVMs calculate a property…

    Jupyter Notebook 38 6

  2. pravah pravah Public

    LLM powered local Search Engine

    Python 30 2

  3. Power_Curve_Estimation Power_Curve_Estimation Public

    Wind energy is one of the fastest growing renewable energy sources. According to a report issued by the U.S. Department of Energy (DOE), wind power installation in the United States increased by ne…

    Jupyter Notebook 5 2

  4. session-aggregator session-aggregator Public

    A unified powerhouse for AI coding history. Sync, search, and export sessions from across all your AI development tools with a beautiful TUI and semantic search.

    Python

  5. pi-agent-extensions pi-agent-extensions Public

    Collection of extensions for pi coding agent (sessions, ask_user, handoff)

    TypeScript 5

  6. entity-resolution-poc entity-resolution-poc Public

    Pure embedding approach to high-scale entity resolution — research POC for people search at 500M record scale

    Python 1 1