Local LLM ChatBot

A lightweight, extensible Python chat assistant that talks to local Large Language Models (LLMs) via Ollama.
It supports dynamic model selection with fallback, (basic) streaming, conversational history, prompt engineering utilities, and optional quality‑repair hooks.

Run fully offline once models are pulled. Ideal for prototyping agents, internal tools, or experimentation without sending data to external APIs.

Key Features

✅ Local inference via Ollama CLI
✅ Automatic model fallback (e.g. llama3 → mistral → phi3)
✅ Optional on‑demand model pull (ensure model present)
✅ Conversation history merging
✅ Blocking or basic streaming execution
✅ Configurable generation parameters (temperature, top_p, num_predict)
✅ Debug logging toggle
✅ Prompt sanitation (removal of control chars)
🧪 Optional post‑generation “repair” function (quality enforcement)
🧩 Designed to be easily upgraded to Ollama’s HTTP API

Architecture Overview

+----------------------+
|   Your App / UI      |  (CLI script, FastAPI, Discord bot, etc.)
+-----------+----------+
            |
            v
     +--------------+
     | ollama_client|  <-- model selection, fallback, prompt build,
     +------+-------+      history injection, streaming, timeouts
            |
            v
     +---------------+
     |  Ollama Daemon|  <-- runs / pulls local LLMs
     +---------------+
            |
         Local Disk (model blobs)

Main entrypoint function: ask_ollama(...) inside ollama_client.py.

Quick Start

# Pull at least one model
ollama pull phi3

# Test the model quickly
ollama run phi3 "Hello"

# Run a Python one‑liner using the client
python -c "from ollama_client import ask_ollama; print(ask_ollama('Give me a friendly greeting.', model='phi3'))"

Requirements

Component	Minimum
Python	3.9+
Ollama	0.11.0+
Disk	Depends on models (e.g. mistral ≈ 4.4 GB)
RAM	8 GB minimum for small models; more recommended for larger

Installation

git clone https://github.com/<YOUR_USER>/<YOUR_REPO>.git
cd <YOUR_REPO>

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# (Add dependencies if you introduce any; base version may have none)
pip install -r requirements.txt  # if the file exists

Check Ollama:

ollama --version
ollama list

Environment Variables

Variable	Purpose	Example
`OLLAMA_MODEL`	Preferred default model	`phi3`
`OLLAMA_FALLBACK_MODELS`	Comma list for fallback	`mistral,phi3`
`SYSTEM_PROMPT`	System role / behavior	`"You are a concise helpful assistant."`
`GEN_TEMPERATURE`	Sampling temperature	`0.6`
`GEN_TOP_P`	Nucleus sampling	`0.9`
`GEN_NUM_PREDICT`	Max tokens (-1 = model default)	`256`
`OLLAMA_TIMEOUT`	Inference inactivity timeout (seconds)	`180`
`OLLAMA_DEBUG`	Debug logs (1/0)	`1`

Example:

export OLLAMA_MODEL=phi3
export OLLAMA_FALLBACK_MODELS=mistral,phi3
export SYSTEM_PROMPT="You are an accurate, polite assistant. Avoid hallucination."
export GEN_TEMPERATURE=0.5
export GEN_TOP_P=0.9
export OLLAMA_DEBUG=1

Usage Examples

1. Basic Call

from ollama_client import ask_ollama
print(ask_ollama("Give me a short inspirational sentence.", model="phi3"))

2. With Conversation History

history = [
    {"role": "user", "content": "Hi, who are you?"},
    {"role": "assistant", "content": "I'm a local AI assistant running on your machine."}
]
response = ask_ollama(
    "Summarize your capabilities in one sentence.",
    model="phi3",
    history=history,
    use_history=True
)
print(response)

3. Streaming Mode

resp = ask_ollama(
    "Explain what a hash map is in simple terms.",
    model="phi3",
    stream=True
)

4. Enforcing Non‑Short Output (Prompt Engineering)

ask_ollama(
    "Write a helpful 3-sentence explanation of recursion. Do not answer in bullet points.",
    model="mistral"
)

5. Fallback Model Behavior

If OLLAMA_MODEL=llama3 but it is not downloaded and OLLAMA_FALLBACK_MODELS=mistral,phi3:

The client will pick the first installed model from that list automatically.

Model Management (Ollama)

Action	Command
List installed models	`ollama list`
Pull a model	`ollama pull mistral`
Run a quick test	`ollama run phi3 "Hello"`
Remove a model	`ollama rm mistral`
Daemon logs (Linux systemd)	`journalctl -u ollama -f`

First run with a new model may be slow (download + unpack). Subsequent runs are much faster.

Prompt & Quality Tips

Problem	Tip
Very short answers	Explicit length: “Answer in at least 2 sentences.”
Hallucinations	Add: “If unsure, say you are unsure.”
Unstable formatting	Provide exact template scaffold in prompt.
Language drift	System prompt: “Always answer in English.”
Repetition	Lower temperature / add “Avoid repeating phrases.”

Few‑shot example:

history = [
  {"role": "user", "content": "Give me a cheerful greeting."},
  {"role": "assistant", "content": "Hello! I hope you're having a great day. How can I assist you?"},
  {"role": "user", "content": "Another one."},
  {"role": "assistant", "content": "Hi there! Ready whenever you are—what would you like to explore today?"}
]
ask_ollama("Give me a third unique greeting.", model="phi3", history=history, use_history=True)

Streaming Mode

Current streaming is a simple stdout line reader (not token‑level structured events). It helps avoid “blank screen” during longer generations or downloads.
To upgrade:

Use Ollama HTTP API (/api/generate with stream: true) and parse JSON events.
Pipe them through WebSocket for a web UI.

Fallback Logic

Try preferred model (OLLAMA_MODEL or provided argument).
If not installed:
- Iterate OLLAMA_FALLBACK_MODELS.
- Use the first installed one.
If none installed:
- Optionally auto‑pull the preferred model (controlled by the client).
Proceed with generation.

Extending (HTTP API Option)

To gain:

True parameter control (temperature, top_k, repeat_penalty, stop tokens)
Proper streaming events
Structured JSON

Sketch:

import requests, json

def http_generate(prompt, model="mistral"):
    url = "http://localhost:11434/api/generate"
    payload = {
        "model": model,
        "prompt": prompt,
        "stream": False,
        "options": {"temperature": 0.6, "top_p": 0.9}
    }
    r = requests.post(url, json=payload, timeout=180)
    r.raise_for_status()
    data = r.json()
    return data.get("response")

Troubleshooting

Symptom	Cause	Fix
Long silence, no answer	Model downloading	Run `ollama pull model` first; show progress
Empty / nonsense output	Weak small model	Use better prompt or switch to `mistral` / `llama3`
Timeout error	Slow hardware / large model	Increase `OLLAMA_TIMEOUT`
CLI not found	PATH issue	Reinstall or add Ollama to PATH
High RAM usage	Large model loaded	Use a smaller model (phi3, qwen2:0.5b)
Mixed language output	Not constrained	Add explicit language rule in system prompt
Very short answer	Under-specified prompt	Add minimum length requirement

Debug:

export OLLAMA_DEBUG=1
python -c "from ollama_client import ask_ollama; ask_ollama('Test', model='phi3')"

Roadmap

Native HTTP API client (JSON streaming)
Web UI (FastAPI + WebSocket)
Response cache layer
Automatic quality repair module (config-toggle)
Benchmark script for latency & token throughput
Multi-user session isolation
Prompt template registry

Contributing

Fork the repo
Create a feature branch: git checkout -b feat/improve-stream
Commit with clear messages
Open a Pull Request describing changes (screenshots if UI-related)

License

Choose and add a license file (e.g. MIT or Apache-2.0).
Example (MIT): Create LICENSE file with MIT template text.

Disclaimer

Local models may hallucinate or produce inaccurate information. Always verify critical outputs before use in production or decision pipelines.

Quick Copy/Paste Recap

ollama pull phi3
python -c "from ollama_client import ask_ollama; print(ask_ollama('Say something encouraging.', model='phi3'))"

Happy hacking! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
data		data
.gitignore		.gitignore
README.md		README.md
bot.py		bot.py
commands.py		commands.py
config.py		config.py
main.py		main.py
memory.py		memory.py
ollama_client.py		ollama_client.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Local LLM ChatBot

Table of Contents

Key Features

Architecture Overview

Quick Start

Requirements

Installation

Environment Variables

Usage Examples

1. Basic Call

2. With Conversation History

3. Streaming Mode

4. Enforcing Non‑Short Output (Prompt Engineering)

5. Fallback Model Behavior

Model Management (Ollama)

Prompt & Quality Tips

Streaming Mode

Fallback Logic

Extending (HTTP API Option)

Troubleshooting

Roadmap

Contributing

License

Disclaimer

Quick Copy/Paste Recap

About

Uh oh!

Releases

Packages

Languages

Nistavilo/ChatBot

Folders and files

Latest commit

History

Repository files navigation

Local LLM ChatBot

Table of Contents

Key Features

Architecture Overview

Quick Start

Requirements

Installation

Environment Variables

Usage Examples

1. Basic Call

2. With Conversation History

3. Streaming Mode

4. Enforcing Non‑Short Output (Prompt Engineering)

5. Fallback Model Behavior

Model Management (Ollama)

Prompt & Quality Tips

Streaming Mode

Fallback Logic

Extending (HTTP API Option)

Troubleshooting

Roadmap

Contributing

License

Disclaimer

Quick Copy/Paste Recap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages