A full-stack AI application that uses a locally hosted LLaMA model (via Ollama) to summarize text. This project demonstrates how to integrate a powerful LLM with a Python backend (FastAPI) and a user-friendly frontend (Streamlit).
Developed as part of Internship Task #1.
- FastAPI Backend: Acts as the bridge between the user and the AI model.
- Streamlit Frontend: A clean, interactive web interface for users to input text.
- Local Inference: Runs entirely offline using Ollama (no API keys required!).
- Model Agnostic: Can easily switch between
llama2,llama3, or lightweight models likephi3.
- Python 3.10+
- Ollama (Model Runner)
- FastAPI (Backend API)
- Streamlit (Frontend UI)
- Uvicorn (ASGI Server)
Running local AI models requires decent hardware. Here is what you need to run this smoothly:
- RAM: 8GB minimum (16GB recommended).
- Note: If you have 8GB, close your browser tabs before running the model!
- GPU (Optional but Recommended): NVIDIA GPU with CUDA installed.
- Performance: Without a GPU, the model will run on your CPU. It works, but text generation will be slower.
- Storage: At least 4GB free space for model weights.
Download and install Ollama from ollama.com.
Open your terminal and download the model. We use phi3 or llama2 (depending on your RAM).
ollama pull phi3
# OR if you have 16GB+ RAM:
ollama pull llama2
## Setup Instructions
1. Clone the repository
2. Install dependencies:
pip install -r requirements.txt
3. Start backend:
uvicorn backend.main:app --reload
4. Start frontend:
streamlit run frontend/app.py