Pappu Sunil Yadav revoker3661

Hello , I'm Pappu Sunil Yadav

📍 India | 💼 Open to Remote GenAI / ML Opportunities

🚀 About Me

I am a GenAI & Applied Machine Learning Engineer focused on building real-world, production-grade AI systems rather than demo-only models.

My core expertise lies in LLMs, Retrieval-Augmented Generation (RAG), multimodal AI, and document intelligence, with a strong emphasis on system architecture — how data flows from raw sources to reliable, grounded AI outputs.

I enjoy working at the intersection of ML engineering, GenAI system design, and practical deployment, especially in high-impact domains like healthcare and medical AI.

🧠 Featured Projects

🔹 Multimodal Clinical RAG Assistant (Medical Text + Image)

What it is:
A doctor-assistive multimodal AI system that jointly understands medical text, tables, diagrams, and patient images to generate clinically grounded explanations using a multimodal LLM.

Architecture: Dual-encoder retrieval using MiniLM for medical text/tables/JSONL entries and OpenCLIP ViT-B/32 for diagrams and patient images.
Retrieval: Hybrid multimodal search with ChromaDB, enabling cross-verification between symptoms, diagnosis text, and visual evidence.
Reasoning: Dynamic prompt composition feeding retrieved text + images into IDEFICS2-8B (4-bit).
Deployment: Interactive Streamlit UI displaying matched visuals, source text, diagnosis, causes, and treatments in real time.

Impact:
• Achieved 100% text retrieval accuracy and near-perfect image retrieval.
• Significantly reduced hallucinations compared to text-only RAG systems.

🎥 Demo: YouTube Channel
🔗 Repo: GitHub

🔹 AI Document Intelligence Pipeline (PDF → Structured JSONL)

Designed a high-accuracy document AI pipeline to convert unstructured medical textbooks into structured JSONL datasets optimized for RAG and multimodal LLMs.

Implemented document layout segmentation using Detectron2 + PubLayNet.
Engineered dual-stage table verification with Microsoft Table Transformer (DETR).
Integrated PaddleOCR for extracting dense labels from scanned diagrams.
Generated spatially grounded metadata including bounding boxes and captions.

Results:
• Processed 500+ pages per book, generating 40,000+ JSONL entries.
• Achieved 95% table extraction accuracy.
• Orchestrated on NVIDIA L4 Cloud GPU.

🔗 GitHub

🔹 Pneumonia Detection Using Deep Learning (VGG19)

Built an automated pneumonia detection system using chest X-ray images to assist radiologists in early and accurate diagnosis.

Applied image preprocessing and augmentation using ImageDataGenerator.
Developed a transfer-learning pipeline with VGG19 and custom dense layers.
Used EarlyStopping and ReduceLROnPlateau to prevent overfitting.

Performance: Achieved 92–97% accuracy for NORMAL vs PNEUMONIA classification.

🔹 Personalized Medicine Recommendation System

Developed a symptom-based medical recommendation system that predicts diseases and provides actionable healthcare guidance.

Trained an SVC classifier achieving 100% test accuracy.
Built a complete ML pipeline including feature encoding and model comparison.
Integrated medical knowledge modules for medications, diets, precautions, and workouts.
Delivered as a full Flask web application with dynamic UI.

🔹 Other Projects

Breast Cancer Classification (Neural Network)
Heart Disease Prediction (Logistic Regression)
Route Optimization System (A*) — Hackathon 2025
Women Safety App — Android + Firebase
Steganography Encryption System
RASA Tour Chatbot

🛠️ Tech Stack

Languages: Python, SQL, C/C++
ML / DL: TensorFlow, PyTorch, Scikit-learn
GenAI: Hugging Face, LangChain, LangGraph, LLMs, RAG
Computer Vision & OCR: OpenCV, Detectron2, PaddleOCR
Vector Databases: ChromaDB
Deployment: Flask, Streamlit, Docker
Infrastructure: NVIDIA L4 GPU

📫 Let’s Connect

📧 Email: yadavpappu3661@gmail.com
💼 LinkedIn: linkedin.com/in/pappu-yadav-3319ab289
💻 GitHub: github.com/revoker3661
🎥 YouTube: youtube.com/@pappuyadav-js3pq

⭐️ Thanks for visiting — let’s build impactful GenAI systems together!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly