Ollama + WideMemory

Finally, a llama that doesn't pretend every conversation is the first one.

Fork maintainer: Radu Cioplea
Email: radu@cioplea.com
URL: www.eyepaq.com
Timestamp: November 2025

This is a fork of Ollama with an integrated persistent memory layer powered by widemem-ai (widemem.ai).

What this fork adds

Standard Ollama has no memory between conversations. This fork adds a memory middleware that gives any local model long-term, intelligent memory — facts are extracted from conversations, stored locally, and automatically recalled in future chats.

How it works

User message → Memory Middleware → search widemem for relevant facts
                                 → inject facts into system prompt
                                 → forward to model (ChatHandler)
                                 → store new facts from conversation (async)

Key details:

Opt-in via the OLLAMA_MEMORY_URL environment variable — without it, Ollama behaves exactly like upstream
Fails open: if the memory sidecar is down, chat works normally with no errors
Memory search adds ~2s latency; fact storage happens asynchronously after the response
Works on the native /api/chat endpoint
All data stays local — no cloud calls required

Files changed from upstream

File	Change
`middleware/memory.go`	New — Go middleware that intercepts `/api/chat`, queries the widemem sidecar for relevant memories, injects them as a system message, and stores new facts asynchronously
`server/routes.go`	One line — added `middleware.MemoryMiddleware()` before `s.ChatHandler` on the `/api/chat` route

Setup

1. Install and start the widemem-ai sidecar:

pip install widemem-ai[server,ollama,sentence-transformers]
python -m widemem.server
# Runs on port 11435 by default

2. Build and run this Ollama fork with memory enabled:

cd ollama
OLLAMA_MEMORY_URL=http://localhost:11435 go run . serve

3. Chat — memories persist across conversations:

# First conversation
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "My name is Radu and I live in Bucharest"}],
  "stream": false
}'

# Later conversation — the model will remember
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "Where do I live?"}],
  "stream": false
}'

Configuration

Environment variable	Default	Description
`OLLAMA_MEMORY_URL`	(unset — memory disabled)	URL of the widemem sidecar
`WIDEMEM_LLM_PROVIDER`	`ollama`	LLM provider for fact extraction
`WIDEMEM_LLM_MODEL`	`llama3.2`	Model used for fact extraction
`WIDEMEM_EMBEDDING_PROVIDER`	`sentence-transformers`	Embedding provider (local, no API key)
`WIDEMEM_DATA_PATH`	`~/.widemem/data`	Where memories are stored on disk

Everything below is the original Ollama README.

Start building with open models.

Download

macOS

curl -fsSL https://ollama.com/install.sh | sh

or download manually

Windows

irm https://ollama.com/install.ps1 | iex

or download manually

Linux

curl -fsSL https://ollama.com/install.sh | sh

Manual install instructions

Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

Libraries

Community

Get started

ollama

You'll be prompted to run a model or connect Ollama to your existing agents or applications such as claude, codex, openclaw and more.

Coding

To launch a specific integration:

ollama launch claude

Supported integrations include Claude Code, Codex, Droid, and OpenCode.

AI assistant

Use OpenClaw to turn Ollama into a personal AI assistant across WhatsApp, Telegram, Slack, Discord, and more:

ollama launch openclaw

Chat with a model

Run and chat with Gemma 3:

ollama run gemma3

See ollama.com/library for the full list.

See the quickstart guide for more details.

REST API

Ollama has a REST API for running and managing models.

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

See the API documentation for all endpoints.

Python

pip install ollama

from ollama import chat

response = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response.message.content)

JavaScript

npm i ollama

import ollama from "ollama";

const response = await ollama.chat({
  model: "gemma3",
  messages: [{ role: "user", content: "Why is the sky blue?" }],
});
console.log(response.message.content);

Supported backends

llama.cpp project founded by Georgi Gerganov.

Documentation

Community Integrations

Want to add your project? Open a pull request.

Chat Interfaces

Web

Open WebUI - Extensible, self-hosted AI interface
Onyx - Connected AI workspace
LibreChat - Enhanced ChatGPT clone with multi-provider support
Lobe Chat - Modern chat framework with plugin ecosystem (docs)
NextChat - Cross-platform ChatGPT UI (docs)
Perplexica - AI-powered search engine, open-source Perplexity alternative
big-AGI - AI suite for professionals
Lollms WebUI - Multi-model web interface
ChatOllama - Chatbot with knowledge bases
Bionic GPT - On-premise AI platform
Chatbot UI - ChatGPT-style web interface
Hollama - Minimal web interface
Chatbox - Desktop and web AI client
chat - Chat web app for teams
Ollama RAG Chatbot - Chat with multiple PDFs using RAG
Tkinter-based client - Python desktop client

Desktop

Dify.AI - LLM app development platform
AnythingLLM - All-in-one AI app for Mac, Windows, and Linux
Maid - Cross-platform mobile and desktop client
Witsy - AI desktop app for Mac, Windows, and Linux
Cherry Studio - Multi-provider desktop client
Ollama App - Multi-platform client for desktop and mobile
PyGPT - AI desktop assistant for Linux, Windows, and Mac
Alpaca - GTK4 client for Linux and macOS
SwiftChat - Cross-platform including iOS, Android, and Apple Vision Pro
Enchanted - Native macOS and iOS client
RWKV-Runner - Multi-model desktop runner
Ollama Grid Search - Evaluate and compare models
macai - macOS client for Ollama and ChatGPT
AI Studio - Multi-provider desktop IDE
Reins - Parameter tuning and reasoning model support
ConfiChat - Privacy-focused with optional encryption
LLocal.in - Electron desktop client
MindMac - AI chat client for Mac
Msty - Multi-model desktop client
BoltAI for Mac - AI chat client for Mac
IntelliBar - AI-powered assistant for macOS
Kerlig AI - AI writing assistant for macOS
Hillnote - Markdown-first AI workspace
Perfect Memory AI - Productivity AI personalized by screen and meeting history

Mobile

Ollama Android Chat - One-click Ollama on Android

SwiftChat, Enchanted, Maid, Ollama App, Reins, and ConfiChat listed above also support mobile platforms.

Code Editors & Development

Cline - VS Code extension for multi-file/whole-repo coding
Continue - Open-source AI code assistant for any IDE
Void - Open source AI code editor, Cursor alternative
Copilot for Obsidian - AI assistant for Obsidian
twinny - Copilot and Copilot chat alternative
gptel Emacs client - LLM client for Emacs
Ollama Copilot - Use Ollama as GitHub Copilot
Obsidian Local GPT - Local AI for Obsidian
Ellama Emacs client - LLM tool for Emacs
orbiton - Config-free text editor with Ollama tab completion
AI ST Completion - Sublime Text 4 AI assistant
VT Code - Rust-based terminal coding agent with Tree-sitter
QodeAssist - AI coding assistant for Qt Creator
AI Toolkit for VS Code - Microsoft-official VS Code extension
Open Interpreter - Natural language interface for computers

Libraries & SDKs

LiteLLM - Unified API for 100+ LLM providers
Semantic Kernel - Microsoft AI orchestration SDK
LangChain4j - Java LangChain (example)
LangChainGo - Go LangChain (example)
Spring AI - Spring framework AI support (docs)
LangChain and LangChain.js with example
Ollama for Ruby - Ruby LLM library
any-llm - Unified LLM interface by Mozilla
OllamaSharp for .NET - .NET SDK
LangChainRust - Rust LangChain (example)
Agents-Flex for Java - Java agent framework (example)
Elixir LangChain - Elixir LangChain
Ollama-rs for Rust - Rust SDK
LangChain for .NET - .NET LangChain (example)
chromem-go - Go vector database with Ollama embeddings (example)
LangChainDart - Dart LangChain
LlmTornado - Unified C# interface for multiple inference APIs
Ollama4j for Java - Java SDK
Ollama for Laravel - Laravel integration
Ollama for Swift - Swift SDK
LlamaIndex and LlamaIndexTS - Data framework for LLM apps
Haystack - AI pipeline framework
Firebase Genkit - Google AI framework
Ollama-hpp for C++ - C++ SDK
PromptingTools.jl - Julia LLM toolkit (example)
Ollama for R - rollama - R SDK
Portkey - AI gateway
Testcontainers - Container-based testing
LLPhant - PHP AI framework

Frameworks & Agents

AutoGPT - Autonomous AI agent platform
crewAI - Multi-agent orchestration framework
Strands Agents - Model-driven agent building by AWS
Cheshire Cat - AI assistant framework
any-agent - Unified agent framework interface by Mozilla
Stakpak - Open source DevOps agent
Hexabot - Conversational AI builder
Neuro SAN - Multi-agent orchestration (docs)

RAG & Knowledge Bases

RAGFlow - RAG engine based on deep document understanding
R2R - Open-source RAG engine
MaxKB - Ready-to-use RAG chatbot
Minima - On-premises or fully local RAG
Chipper - AI interface with Haystack RAG
ARGO - RAG and deep research on Mac/Windows/Linux
Archyve - RAG-enabling document library
Casibase - AI knowledge base with RAG and SSO
BrainSoup - Native client with RAG and multi-agent automation

Bots & Messaging

LangBot - Multi-platform messaging bots with agents and RAG
AstrBot - Multi-platform chatbot with RAG and plugins
Discord-Ollama Chat Bot - TypeScript Discord bot
Ollama Telegram Bot - Telegram bot
LLM Telegram Bot - Telegram bot for roleplay

Terminal & CLI

aichat - All-in-one LLM CLI with Shell Assistant, RAG, and AI tools
oterm - Terminal client for Ollama
gollama - Go-based model manager for Ollama
tlm - Local shell copilot
tenere - TUI for LLMs
ParLlama - TUI for Ollama
llm-ollama - Plugin for Datasette's LLM CLI
ShellOracle - Shell command suggestions
LLM-X - Progressive web app for LLMs
cmdh - Natural language to shell commands
VT - Minimal multimodal AI chat app

Productivity & Apps

AppFlowy - AI collaborative workspace, self-hostable Notion alternative
Screenpipe - 24/7 screen and mic recording with AI-powered search
Vibe - Transcribe and analyze meetings
Page Assist - Chrome extension for AI-powered browsing
NativeMind - Private, on-device browser AI assistant
Ollama Fortress - Security proxy for Ollama
1Panel - Web-based Linux server management
Writeopia - Text editor with Ollama integration
QA-Pilot - GitHub code repository understanding
Raycast extension - Ollama in Raycast
Painting Droid - Painting app with AI integrations
Serene Pub - AI roleplaying app
Mayan EDMS - Document management with Ollama workflows
TagSpaces - File management with AI tagging

Observability & Monitoring

Opik - Debug, evaluate, and monitor LLM applications
OpenLIT - OpenTelemetry-native monitoring for Ollama and GPUs
Lunary - LLM observability with analytics and PII masking
Langfuse - Open source LLM observability
HoneyHive - AI observability and evaluation for agents
MLflow Tracing - Open source LLM observability

Name		Name	Last commit message	Last commit date
Latest commit History 5,161 Commits
.github		.github
anthropic		anthropic
api		api
app		app
auth		auth
cmd		cmd
convert		convert
discover		discover
docs		docs
envconfig		envconfig
format		format
fs		fs
harmony		harmony
integration		integration
internal		internal
kvcache		kvcache
llama		llama
llm		llm
logutil		logutil
manifest		manifest
middleware		middleware
ml		ml
model		model
openai		openai
parser		parser
progress		progress
readline		readline
runner		runner
sample		sample
scripts		scripts
server		server
template		template
thinking		thinking
tokenizer		tokenizer
tools		tools
types		types
version		version
x		x
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MLX_CORE_VERSION		MLX_CORE_VERSION
MLX_VERSION		MLX_VERSION
Makefile.sync		Makefile.sync
README.md		README.md
SECURITY.md		SECURITY.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Folders and files

Latest commit

History

Repository files navigation

Ollama + WideMemory

Finally, a llama that doesn't pretend every conversation is the first one.

What this fork adds

How it works

Files changed from upstream

Setup

Configuration

Download

macOS

Windows

Linux

Docker

Libraries

Community

Get started

Coding

AI assistant

Chat with a model

REST API

Python

JavaScript

Supported backends

Documentation

Community Integrations

Chat Interfaces

Web

Desktop

Mobile

Code Editors & Development

Libraries & SDKs

Frameworks & Agents

RAG & Knowledge Bases

Bots & Messaging

Terminal & CLI

Productivity & Apps

Observability & Monitoring

Database & Embeddings

Infrastructure & Deployment

Cloud

Package Managers

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages