🤖 LLM Web App (CPU-Compatible)

A professional demo web application showcasing a Large Language Model (LLM) deployed with FastAPI.
Designed to run entirely on CPU (Windows-compatible) — this project demonstrates end-to-end AI deployment, from backend to frontend.

🚀 Powered by google/flan-t5-large, a powerful instruction-tuned model for high-quality, natural language responses.

✨ Features

✅ Web Interface — Simple HTML form for real-time LLM input/output.
✅ Instruction-Tuned Responses — Generates clear, context-aware answers.
✅ Beam Search (num_beams=3) — Deterministic, accurate text generation.
✅ CPU-Only Operation — Runs locally on Windows, no GPU required.
✅ Error Handling — Graceful handling of invalid or long prompts.
✅ Extendable — Easy to adapt for RAG, summarization, or chatbots.

🖼️ Demo Screenshot

🧰 Tech Stack

Component	Description
🐍 Python 3.11+	Core language
⚡ FastAPI	Backend framework
🔥 Uvicorn	ASGI web server
🧩 Jinja2	HTML templating
🧠 Transformers + PyTorch	LLM inference
💻 HTML/CSS	Frontend interface

⚙️ Installation

# 1️⃣ Clone the repository
git clone https://github.com/yourusername/llm-web-app.git
cd llm-web-app

# 2️⃣ Create a virtual environment
python -m venv venv
.\venv\Scripts\activate  # (Windows)

# 3️⃣ Install dependencies
pip install -r requirements.txt

## 🚀 Usage
1. Start the FastAPI app:
'''bash
uvicorn app:app --reload
2. Open your browser: http://127.0.0.1:8000
3. Enter a prompt in the web form and submit.
4. View the LLM response below the form.

💬 Example Prompts
* "Explain the difference between supervised and unsupervised learning."
* "Write a Python function to calculate the Fibonacci sequence recursively."
* "Summarize the following text in 2 sentences:..."
* "Provide 3 tips for preparing for a technical interview."
Tip: Frame prompts clearly and include instructions for better responses.

Project Structure

llm-web-app/
│
├─ app.py               # FastAPI backend with LLM integration
├─ templates/
│   └─ index.html       # HTML frontend
├─ requirements.txt     # Python dependencies
└─ README.md            # Project documentation

📝 Notes

⚙️ First-time model load may take 5–15 seconds on CPU.
⚡ Inference speed depends on your CPU and max_new_tokens setting.
🚀 For faster responses, reduce max_new_tokens in generate_response().

🌱 Future Improvements

⏳ Add a loading spinner while the LLM generates responses.
🧠 Integrate RAG (Retrieval-Augmented Generation) for domain knowledge.
🐳 Deploy via Docker for cross-platform portability.
💬 Enable multi-user chat sessions with memory.

⚖️ License
This project is licensed under the MIT License — free to use, modify, and share with attribution.

👩‍💻 Author

Sherry Courington
AI Engineer | MLOps | Computer Vision | LLM Applications
📫 Connect on [LinkedIn](https://www.linkedin.com/in/scourington1/)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
templates		templates
.gitignore		.gitignore
Preview.jpg		Preview.jpg
README.md		README.md
app.py		app.py
llm-web-app.code-workspace		llm-web-app.code-workspace
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 LLM Web App (CPU-Compatible)

✨ Features

🖼️ Demo Screenshot

🧰 Tech Stack

⚙️ Installation

About

Uh oh!

Releases

Packages

Languages

scouring/llm-web-app

Folders and files

Latest commit

History

Repository files navigation

🤖 LLM Web App (CPU-Compatible)

✨ Features

🖼️ Demo Screenshot

🧰 Tech Stack

⚙️ Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages