A lightweight, self-contained Python project for running a local large language model (LLM) with minimal dependencies. This system uses TinyLlama-1.1B-Chat-v1.0.0 and llama-cpp-python for inference, and Rich for a user-friendly console chat interface.
- Out-of-the-box experience: Just run
run_app.batto install dependencies and start chatting. - Minimal dependencies: Uses UV for fast dependency resolution and virtual environment management.
- Rich console UI: Interactive chat interface with syntax highlighting and formatted responses.
- Configurable LLM settings: Adjust temperature, top-p, max tokens, and system prompt via
chat.py. - Self-contained model: Includes the TinyLlama GGUF model file for immediate use.
local-llm-system/
├── .venv/ # Python virtual environment (auto-created)
├── app/
│ ├── llm/
│ │ ├── chat.py # Handles the chat UI and LLM configuration
│ │ └── engine.py # Loads the LLM model and sets inference parameters
│ ├── models/ # Contains the GGUF model file
│ │ └── tinyllama-1.1b-chat-v1.0.Q8_0.gguf
│ └── app.py # Main entry point for the application
├── pyproject.toml # Project metadata and dependencies
├── README.md # Project documentation
├── run_app.bat # Script to install dependencies and run the app
└── uv.lock # Lockfile for reproducible dependency resolutionWindows OS (for .bat script compatibility) Python 3.10 or higher (recommended: 3.11+) Internet connection (only for initial dependency installation)
-
Download the Project Download the project and extract it to a folder of your choice.
-
Run the Application Double-click run_app.bat or run it from the command line:
run_app.batThis will:
Install uv (if not already installed) Create a virtual environment (.venv) Sync dependencies using uv sync Run the application with uv run app/app.py
- Start Chatting Start chatting with the LLM in the console! Local LLM Chat Screenshot (Replace with an actual screenshot of your chat interface)
- Dependency Management
The project uses UV for fast dependency resolution and virtual environment management. Dependencies are listed in pyproject.toml and installed automatically when you run run_app.bat.
- Model Loading
engine.py loads the TinyLlama model from the app/models/ directory using llama-cpp-python. The model is configured with:
Context length: 2048 tokens Threads: 8 (adjustable) Verbose mode: Off
- Chat Interface
chat.py creates a rich console UI using the rich library. The chat system:
Displays user queries and LLM responses in a formatted console. Maintains a conversation history (up to 6 turns). Uses a system prompt for consistent LLM behavior. Chat UI Screenshot (Replace with an actual screenshot of your chat UI)
- Main Application
app.py initializes the LLM engine and starts the chat interface.
LLM Settings Edit app/llm/chat.py to adjust:
SYSTEM_PROMPT: Change the initial prompt for the LLM. max_tokens: Limit the response length. temperature/top_p: Control response randomness. MAX_HISTORY: Adjust the number of conversation turns retained. Model Replacement Replace the GGUF model in app/models/ with any compatible GGUF file (e.g., from Hugging Face).
Missing dependencies: Ensure run_app.bat completes without errors. If issues persist, run:
cd Local-LLM-System
uv syncModel not found: Verify the GGUF file is in app/models/ and named tinyllama-1.1b-chat-v1.0.Q8_0.gguf. Slow performance: Reduce max_tokens or use a smaller model.
This project is open-source and free to use. The TinyLlama model is licensed under its respective terms (see Hugging Face).
llama-cpp-python for LLM inference. Rich for the console UI. TinyLlama for the model.
The .venv directory is auto-created by uv sync. The uv.lock file ensures reproducible dependency resolution.


