A Discourse forum bot powered by any OpenAI-compatible LLM backend (LocalAI, Ollama, vLLM, etc.) that responds to mentions and searches external knowledge sources.
- Responds to @mentions in your Discourse forum
- Works with any OpenAI-compatible API (LocalAI, Ollama, vLLM, etc.)
- Automatic model loading with custom settings via LocalAI's
/models/applyendpoint - Web search integration via Ollama Web Search API
- Rate limiting to avoid spamming the forum
- Persists processed notifications to avoid duplicate replies
- Runs entirely in Docker with LocalAI included
-
Copy
.env.exampleto.envand configure your settings:cp .env.example .env
-
Build and run with Docker Compose:
docker-compose up -d
-
Check the logs:
docker-compose logs -f
| Variable | Description |
|---|---|
DISCOURSE_HOST |
Your Discourse forum URL (e.g., https://forum.example.com) |
DISCOURSE_API_KEY |
API key from Discourse Admin > API |
DISCOURSE_USERNAME |
Bot's username on the forum |
| Variable | Default | Description |
|---|---|---|
BOT_MENTION_TRIGGERS |
@discussy |
Comma-separated list of @mentions the bot responds to |
POLL_INTERVAL_MS |
30000 |
How often to check for new mentions (milliseconds) |
MIN_REPLY_INTERVAL_MS |
120000 |
Minimum time between replies in milliseconds (prevents spamming) |
BOT_MAX_RESPONSE_LENGTH |
2000 |
Max response length in characters |
DEBUG_MODE |
false |
If true, log responses to console instead of posting to Discourse |
BOT_SYSTEM_PROMPT |
(built-in) | Custom system prompt for the bot's personality |
BOT_BLOCKED_PATTERNS |
(built-in) | Pipe-separated regex patterns to filter from responses |
Controls how the bot creates and responds to threads.
| Variable | Default | Description |
|---|---|---|
THREAD_MODE |
any |
Thread handling mode (see below) |
THREAD_CATEGORY |
1 |
Category ID for new threads (defaults to Uncategorized) |
THREAD_TITLE |
Daily Discussion Thread - {date} |
Title template for new threads |
THREAD_CONTENT |
(built-in) | Content for the first post in new threads |
THREAD_MAX_AGE_HOURS |
48 |
Only reply to bot-started threads within this age |
Thread Mode Options:
| Mode | Description |
|---|---|
startup |
Creates a new thread when the bot starts. Only replies to threads it started (within 48 hours). |
daily |
Creates a new thread every day at 08:00 UTC. Only replies to threads it started (within 48 hours). |
weekly |
Creates a new thread every Monday at 08:00 UTC. Only replies to threads it started (within 48 hours). |
any |
Replies to any thread where the bot is mentioned (default behavior). |
Title Template Placeholders:
{date}- Current date in YYYY-MM-DD format{weekday}- Current day of the week (e.g., "Monday"){hhmm}- Current time in HHMM format (UTC, e.g., "1430" for 2:30 PM)
The bot works with any OpenAI-compatible API. Settings are passed to LocalAI via the REST API when the model is loaded.
| Variable | Default | Description |
|---|---|---|
LLM_HOST |
http://localhost:8080/v1 |
OpenAI-compatible API endpoint |
LLM_MODEL |
gpt-4 |
Model name to use for API calls |
LLM_MODEL_URL |
- | LocalAI model URL for /models/apply (see below) |
LLM_API_KEY |
- | API key (optional - LocalAI doesn't require one) |
WEB_SEARCH_API_KEY |
- | API key from ollama.com/settings/keys |
These parameters control how the LLM generates responses. When LLM_MODEL_URL is set, these are passed to LocalAI via the /models/apply REST API endpoint when loading the model.
| Variable | Default | Description |
|---|---|---|
LLM_TEMPERATURE |
0.7 |
Sampling temperature (0.0-2.0). Higher = more creative/random, lower = more focused. |
LLM_TOP_P |
0.9 |
Nucleus sampling threshold (0.0-1.0). Lower values = more focused on likely tokens. |
LLM_TOP_K |
40 |
Consider only the top K most likely tokens. |
LLM_MAX_TOKENS |
1024 |
Maximum number of tokens to generate (0 = unlimited). |
LLM_CONTEXT_SIZE |
2048 |
Maximum context window size in tokens. Larger values use more memory but allow longer conversations. |
LLM_REPEAT_PENALTY |
1.1 |
Penalty for repeating tokens (1.0 = no penalty). Higher values discourage repetition. |
LLM_THREADS |
0 |
CPU threads for inference (0 = auto-detect). |
Temperature Guidelines:
0.3-0.5: Very focused, factual responses0.7: Balanced (default)0.9-1.2: More creative, playful responses
When LLM_MODEL_URL is set, the bot automatically loads the model on startup using LocalAI's /models/apply endpoint. This allows you to:
- Download models from HuggingFace or the LocalAI gallery
- Configure the model with your custom system prompt
- Set all generation parameters at the model level
Example Model URLs:
# List available models:
curl http://localhost:8080/models/available | jq -r '.[].name'
# LocalAI Gallery models (recommended):
LLM_MODEL_URL=llama-3.2-3b-instruct:q4_k_m # Small, fast (1.9 GB)
LLM_MODEL_URL=llama-3.2-3b-instruct:q8_0 # Higher quality
LLM_MODEL_URL=llama-3.3-70b-instruct # Large, requires lots of RAMHow it works:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Your .env │ → │ Bot startup │ → │ LocalAI REST │
│ │ │ (Node.js) │ │ /models/apply │
└─────────────────┘ └─────────────────┘ └─────────────────┘
The bot reads your .env file and sends a POST request to LocalAI:
{
"id": "llama-3.2-3b-instruct:q4_k_m",
"name": "gpt-4",
"overrides": {
"system_prompt": "You are a helpful assistant...",
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40,
"max_tokens": 2048,
"context_size": 4096,
"repeat_penalty": 1.1
}
}-
Startup: The bot connects to LocalAI and loads the configured model with your settings
-
Polling: The bot polls Discourse for new notifications every 30 seconds (configurable)
-
Mention Detection: When someone @mentions the bot (e.g.,
@discussy what is X?), it:- Fetches the post content
- Gets reply chain context (follows the conversation thread)
- Searches the forum for relevant existing posts
- Performs web search for external information (if API key configured and question needs it)
- Generates a response using the LLM
-
Rate Limiting: The bot waits between replies to avoid spamming (configurable via
MIN_REPLY_INTERVAL_MS) -
Persistence: Processed notification IDs are saved to disk, so the bot won't reply twice to the same mention even after restarts
The bot automatically searches your Discourse forum for relevant posts before generating a response. This helps provide context-aware answers based on existing discussions.
For questions that may need external knowledge, the bot can search the web using the Ollama Web Search API. Web search is triggered when questions contain patterns like "what is", "how to", "latest", etc. Get your API key from ollama.com/settings/keys and set WEB_SEARCH_API_KEY.
The docker-compose.yml is configured for ARM64 architecture:
localai:
image: localai/localai:latest-aio-cpu
platform: linux/arm64Remove the platform line or change to:
localai:
image: localai/localai:latest-aio-cpu
platform: linux/amd64npm install
npm run buildnpm startdocker-compose build├── src/
│ ├── index.ts # Entry point
│ ├── bot.ts # Main bot logic
│ ├── config.ts # Configuration loader
│ ├── discourse-client.ts # Discourse API client
│ └── llm-client.ts # LLM API client (OpenAI-compatible)
├── .github/workflows/ # CI/CD workflows
├── docker-compose.yml
├── Dockerfile
├── .env.example
└── package.json