Applied AI Engineer with a strong full-stack background, focused on building production-grade RAG and LLM systems.
I work at the intersection of:
- Retrieval-Augmented Generation (RAG)
- Embeddings & semantic search
- Backend systems & APIs
- Product-focused AI engineering
- Designing end-to-end RAG systems for large document collections
- Building ingestion, chunking, and embedding pipelines
- Semantic & hybrid retrieval with citation-aware context assembly
- Streaming LLM responses integrated into real products
- Making RAG systems observable, tunable, and maintainable
- LLMs & RAG: OpenAI-compatible APIs, local models
- Backend: FastAPI, Node.js
- Data: PostgreSQL, vector databases (Qdrant / FAISS)
- Infra: background workers, async pipelines
- Frontend: React / Next.js (when product UX matters)
RAG is a system, not a prompt.
Data quality beats clever prompting.
Production constraints shape good AI.
- 🧠 RAG Platform — production-oriented retrieval system for unstructured data
- 🛠 Observer — AI-native tooling for working with complex data flows
- 📦 OSS libraries used in production products
📍 New York, NY
🔗 https://github.com/keske





