Skip to content

LLM Web App (CPU-Compatible) is a professional-grade web application that demonstrates the deployment of a Large Language Model (LLM) using FastAPI. Specifically designed to operate entirely on CPU, this project ensures compatibility with Windows environments, making it accessible for a wide range of users without the need for specialized hardware.

Notifications You must be signed in to change notification settings

scouring/llm-web-app

Repository files navigation

🤖 LLM Web App (CPU-Compatible)

A professional demo web application showcasing a Large Language Model (LLM) deployed with FastAPI.
Designed to run entirely on CPU (Windows-compatible) — this project demonstrates end-to-end AI deployment, from backend to frontend.

🚀 Powered by google/flan-t5-large, a powerful instruction-tuned model for high-quality, natural language responses.


✨ Features

Web Interface — Simple HTML form for real-time LLM input/output.
Instruction-Tuned Responses — Generates clear, context-aware answers.
Beam Search (num_beams=3) — Deterministic, accurate text generation.
CPU-Only Operation — Runs locally on Windows, no GPU required.
Error Handling — Graceful handling of invalid or long prompts.
Extendable — Easy to adapt for RAG, summarization, or chatbots.


🖼️ Demo Screenshot

App Preview


🧰 Tech Stack

Component Description
🐍 Python 3.11+ Core language
FastAPI Backend framework
🔥 Uvicorn ASGI web server
🧩 Jinja2 HTML templating
🧠 Transformers + PyTorch LLM inference
💻 HTML/CSS Frontend interface

⚙️ Installation

# 1️⃣ Clone the repository
git clone https://github.com/yourusername/llm-web-app.git
cd llm-web-app

# 2️⃣ Create a virtual environment
python -m venv venv
.\venv\Scripts\activate  # (Windows)

# 3️⃣ Install dependencies
pip install -r requirements.txt

## 🚀 Usage
1. Start the FastAPI app:
'''bash
uvicorn app:app --reload
2. Open your browser: http://127.0.0.1:8000
3. Enter a prompt in the web form and submit.
4. View the LLM response below the form.

💬 Example Prompts
* "Explain the difference between supervised and unsupervised learning."
* "Write a Python function to calculate the Fibonacci sequence recursively."
* "Summarize the following text in 2 sentences:..."
* "Provide 3 tips for preparing for a technical interview."
Tip: Frame prompts clearly and include instructions for better responses.

Project Structure

llm-web-app/

├─ app.py               # FastAPI backend with LLM integration
├─ templates/
│   └─ index.html       # HTML frontend
├─ requirements.txt     # Python dependencies
└─ README.md            # Project documentation

📝 Notes

⚙️ First-time model load may take 5–15 seconds on CPU.
⚡ Inference speed depends on your CPU and max_new_tokens setting.
🚀 For faster responses, reduce max_new_tokens in generate_response().

🌱 Future Improvements

⏳ Add a loading spinner while the LLM generates responses.
🧠 Integrate RAG (Retrieval-Augmented Generation) for domain knowledge.
🐳 Deploy via Docker for cross-platform portability.
💬 Enable multi-user chat sessions with memory.

⚖️ License
This project is licensed under the MIT License — free to use, modify, and share with attribution.

👩‍💻 Author

Sherry Courington
AI Engineer | MLOps | Computer Vision | LLM Applications
📫 Connect on [LinkedIn](https://www.linkedin.com/in/scourington1/)

About

LLM Web App (CPU-Compatible) is a professional-grade web application that demonstrates the deployment of a Large Language Model (LLM) using FastAPI. Specifically designed to operate entirely on CPU, this project ensures compatibility with Windows environments, making it accessible for a wide range of users without the need for specialized hardware.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published