Overview β’ Features β’ Architecture β’ Setup β’ Usage & API
The Smart AI Agent is a sophisticated, full-stack application capable of answering complex queries by intelligently bridging private knowledge and the public internet.
Instead of relying solely on its training data, the agent dynamically routes user questions. It first attempts to use Retrieval-Augmented Generation (RAG) via a custom Pinecone vector database. If the retrieved internal knowledge is deemed insufficient by the LLM, or if the user explicitly requires internet access, the agent autonomously falls back to real-time web search using the Tavily API.
With a decoupled architecture featuring a FastAPI backend and a Streamlit frontend, this project is designed for scalability, transparency, and granular user control.
- π Intelligent Routing: Combines internal RAG knowledge with real-time web search, dynamically selecting the best information source for each query.
- π§ Contextual RAG Sufficiency: Employs the LLM to critically assess if retrieved RAG content is sufficient to answer a query. If not, it actively prevents incomplete responses by prompting further internet search.
- π΅οΈ Transparent AI Workflow: The UI features an "Agent Trace" that provides a detailed, step-by-step log of the agent's internal thought process, routing decisions, and retrieval summaries.
- ποΈ User-Controlled Web Access: A UI toggle allows users to strictly confine the agent to internal documents or grant it broader internet access.
- π Dynamic Knowledge Ingestion: Upload PDF documents directly through the UI to have them automatically chunked, embedded, and indexed into the Pinecone knowledge base.
- πΎ Persistent Memory: Utilizes LangGraph's checkpointing to maintain conversation context and memory across multiple chat turns.
The application is built on a clean, layered architecture ensuring separation of concerns between the user interface, API endpoints, and the LangGraph AI logic.
graph TD
U[User] -->|Chat & PDFs| S[Streamlit Frontend]
S <-->|REST API| F[FastAPI Backend]
subgraph Agent Core
F --> A{LangGraph Agent<br>Groq Llama 3}
A -->|Evaluates Query| R{Decision Router}
R -->|Internal Knowledge| DB[(Pinecone VectorDB)]
DB -->|Retrieve Chunks| Eval{Is RAG Sufficient?}
Eval -->|Yes| Output[Generate Response]
Eval -->|No| Web[Tavily Search API]
R -->|Requires Internet| Web
Web --> Output
end
Output -->|Agent Trace & Text| F
| Component | Technology | Role |
|---|---|---|
| Frontend | Streamlit | Interactive chat UI, state management, and trace logs |
| Backend API | FastAPI | High-performance async API to handle requests |
| Agent Core | LangGraph | AI workflow orchestration, routing, and memory |
| LLM Inference | Groq (Llama 3) | Ultra-fast reasoning and text generation |
| Search Engine | Tavily API | Real-time internet search and scraping |
| Embeddings | HuggingFace | all-MiniLM-L6-v2 for generating document vectors |
| Vector Store | Pinecone | Storing and retrieving embedded document chunks |
agentBot/
βββ frontend/
β βββ app.py # Streamlit entry point
β βββ ui_components.py # Chat UI, toggle, trace
β βββ backend_api.py # API communication logic
β βββ session_manager.py # Streamlit state management
β βββ config.py # Frontend configuration
β
βββ backend/
β βββ main.py # FastAPI entry point
β βββ agent.py # LangGraph AI agent workflow
β βββ vectorstore.py # Pinecone RAG logic & PyPDFLoader
β βββ config.py # API keys and environment variables
β
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (Ignored in Git)
Ensure you have Python 3.9+ installed and the following accounts configured:
- Pinecone: Create an index named
rag-indexwith384dimensions and thecosinemetric. - API Keys: You will need keys for Groq, Pinecone, and Tavily.
git clone [https://github.com/your-username/agentBot.git](https://github.com/your-username/agentBot.git)
cd agentBot
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Create a .env file at the root of the project:
GROQ_API_KEY="your_groq_api_key_here"
PINECONE_API_KEY="your_pinecone_api_key_here"
PINECONE_ENVIRONMENT="your_pinecone_environment"
TAVILY_API_KEY="your_tavily_api_key"
FASTAPI_BASE_URL="http://localhost:8000"
Because of the decoupled architecture, you need to start the backend and frontend separately.
Terminal 1: Start the Backend (FastAPI)
cd backend
uvicorn main:app --reload --host 0.0.0.0 --port 8000
Terminal 2: Start the Frontend (Streamlit)
cd ..
streamlit run frontend/app.py
While you can use the Streamlit UI, the FastAPI backend is fully accessible for testing via Postman or cURL.
Uploads a PDF, chunks it, and indexes it into Pinecone.
- Body:
form-data, key=file, type=File - Response:
{
"message": "PDF 'doc.pdf' successfully uploaded and indexed.",
"filename": "doc.pdf",
"processed_chunks": 5
}
Send a query and dictate web search permissions.
- Body (JSON):
{
"session_id": "test-session-001",
"query": "What are the treatments for diabetes?",
"enable_web_search": true
}
- Response:
{
"response": "Your agent's generated answer here...",
"trace_events": [
{
"step": 1,
"node_name": "router",
"description": "Evaluated query and routed to Pinecone RAG.",
"event_type": "router_decision"
}
]
}
- Tool Expansion: Integrate tools like a calculator, calendar, or code interpreter.
- Token Streaming: Stream LLM output token-by-token to the frontend for a more responsive UI.
- Advanced RAG: Implement document re-ranking and multi-query translation.
- User Authentication: Add login profiles and persistent long-term memory databases for user chat history.
Built with β€οΈ using LangGraph, Groq, and Streamlit
Star this repo if you find it helpful! β