PaperSense is a Retrieval-Augmented Generation (RAG) system that allows users to upload research-relevant documents in PDF format and ask questions. The system retrieves relevant document sections using semantic search and generates grounded answers using Google's Gemini LLM, gemini-2.5-flash.
• Upload research papers dynamically via REST API or UI • Automatic PDF parsing, chunking, and vector embedding • Semantic retrieval using FAISS • Context-aware question answering using Gemini • Source citation for every answer • Minimal and clean Streamlit user interface
- PDFs are uploaded through API or UI
- Text is extracted and split into overlapping chunks
- Chunks are embedded using Sentence Transformers
- Embeddings are stored in a FAISS vector index
- User queries retrieve the most relevant chunks
- Gemini generates answers grounded in retrieved context
• Backend API: Flask • LLM: gemini-2.5-flash • Embeddings: Sentence Transformers • Vector Store: FAISS • PDF Parsing: pypdf • UI: Streamlit
- Clone the repository
- Create a virtual environment
- Install dependencies from requirements.txt
- Add your own Gemini API key in a .env file
- Run the Flask API
- Run the Streamlit UI
• Research paper analysis • Literature review assistance • Academic project demos • RAG-based GenAI experimentation
The .env file containing API keys is intentionally excluded. Users must provide their own API keys to run the application.
PaperSense