Skip to content

remete618/ollama

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5,161 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ollama

Ollama + WideMemory

Finally, a llama that doesn't pretend every conversation is the first one.

Fork maintainer
Radu Cioplea
Email
radu@cioplea.com
URL
www.eyepaq.com
Timestamp
November 2025

This is a fork of Ollama with an integrated persistent memory layer powered by widemem-ai (widemem.ai).

widemem.ai — Memory Infrastructure for LLM Agents

What this fork adds

Standard Ollama has no memory between conversations. This fork adds a memory middleware that gives any local model long-term, intelligent memory — facts are extracted from conversations, stored locally, and automatically recalled in future chats.

How it works

User message → Memory Middleware → search widemem for relevant facts
                                 → inject facts into system prompt
                                 → forward to model (ChatHandler)
                                 → store new facts from conversation (async)

Key details:

  • Opt-in via the OLLAMA_MEMORY_URL environment variable — without it, Ollama behaves exactly like upstream
  • Fails open: if the memory sidecar is down, chat works normally with no errors
  • Memory search adds ~2s latency; fact storage happens asynchronously after the response
  • Works on the native /api/chat endpoint
  • All data stays local — no cloud calls required

Files changed from upstream

File Change
middleware/memory.go New — Go middleware that intercepts /api/chat, queries the widemem sidecar for relevant memories, injects them as a system message, and stores new facts asynchronously
server/routes.go One line — added middleware.MemoryMiddleware() before s.ChatHandler on the /api/chat route

Setup

1. Install and start the widemem-ai sidecar:

pip install widemem-ai[server,ollama,sentence-transformers]
python -m widemem.server
# Runs on port 11435 by default

2. Build and run this Ollama fork with memory enabled:

cd ollama
OLLAMA_MEMORY_URL=http://localhost:11435 go run . serve

3. Chat — memories persist across conversations:

# First conversation
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "My name is Radu and I live in Bucharest"}],
  "stream": false
}'

# Later conversation — the model will remember
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "Where do I live?"}],
  "stream": false
}'

Configuration

Environment variable Default Description
OLLAMA_MEMORY_URL (unset — memory disabled) URL of the widemem sidecar
WIDEMEM_LLM_PROVIDER ollama LLM provider for fact extraction
WIDEMEM_LLM_MODEL llama3.2 Model used for fact extraction
WIDEMEM_EMBEDDING_PROVIDER sentence-transformers Embedding provider (local, no API key)
WIDEMEM_DATA_PATH ~/.widemem/data Where memories are stored on disk

Everything below is the original Ollama README.


Start building with open models.

Download

macOS

curl -fsSL https://ollama.com/install.sh | sh

or download manually

Windows

irm https://ollama.com/install.ps1 | iex

or download manually

Linux

curl -fsSL https://ollama.com/install.sh | sh

Manual install instructions

Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

Libraries

Community

Get started

ollama

You'll be prompted to run a model or connect Ollama to your existing agents or applications such as claude, codex, openclaw and more.

Coding

To launch a specific integration:

ollama launch claude

Supported integrations include Claude Code, Codex, Droid, and OpenCode.

AI assistant

Use OpenClaw to turn Ollama into a personal AI assistant across WhatsApp, Telegram, Slack, Discord, and more:

ollama launch openclaw

Chat with a model

Run and chat with Gemma 3:

ollama run gemma3

See ollama.com/library for the full list.

See the quickstart guide for more details.

REST API

Ollama has a REST API for running and managing models.

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

See the API documentation for all endpoints.

Python

pip install ollama
from ollama import chat

response = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response.message.content)

JavaScript

npm i ollama
import ollama from "ollama";

const response = await ollama.chat({
  model: "gemma3",
  messages: [{ role: "user", content: "Why is the sky blue?" }],
});
console.log(response.message.content);

Supported backends

  • llama.cpp project founded by Georgi Gerganov.

Documentation

Community Integrations

Want to add your project? Open a pull request.

Chat Interfaces

Web

Desktop

  • Dify.AI - LLM app development platform
  • AnythingLLM - All-in-one AI app for Mac, Windows, and Linux
  • Maid - Cross-platform mobile and desktop client
  • Witsy - AI desktop app for Mac, Windows, and Linux
  • Cherry Studio - Multi-provider desktop client
  • Ollama App - Multi-platform client for desktop and mobile
  • PyGPT - AI desktop assistant for Linux, Windows, and Mac
  • Alpaca - GTK4 client for Linux and macOS
  • SwiftChat - Cross-platform including iOS, Android, and Apple Vision Pro
  • Enchanted - Native macOS and iOS client
  • RWKV-Runner - Multi-model desktop runner
  • Ollama Grid Search - Evaluate and compare models
  • macai - macOS client for Ollama and ChatGPT
  • AI Studio - Multi-provider desktop IDE
  • Reins - Parameter tuning and reasoning model support
  • ConfiChat - Privacy-focused with optional encryption
  • LLocal.in - Electron desktop client
  • MindMac - AI chat client for Mac
  • Msty - Multi-model desktop client
  • BoltAI for Mac - AI chat client for Mac
  • IntelliBar - AI-powered assistant for macOS
  • Kerlig AI - AI writing assistant for macOS
  • Hillnote - Markdown-first AI workspace
  • Perfect Memory AI - Productivity AI personalized by screen and meeting history

Mobile

SwiftChat, Enchanted, Maid, Ollama App, Reins, and ConfiChat listed above also support mobile platforms.

Code Editors & Development

Libraries & SDKs

Frameworks & Agents

RAG & Knowledge Bases

  • RAGFlow - RAG engine based on deep document understanding
  • R2R - Open-source RAG engine
  • MaxKB - Ready-to-use RAG chatbot
  • Minima - On-premises or fully local RAG
  • Chipper - AI interface with Haystack RAG
  • ARGO - RAG and deep research on Mac/Windows/Linux
  • Archyve - RAG-enabling document library
  • Casibase - AI knowledge base with RAG and SSO
  • BrainSoup - Native client with RAG and multi-agent automation

Bots & Messaging

Terminal & CLI

Productivity & Apps

Observability & Monitoring

  • Opik - Debug, evaluate, and monitor LLM applications
  • OpenLIT - OpenTelemetry-native monitoring for Ollama and GPUs
  • Lunary - LLM observability with analytics and PII masking
  • Langfuse - Open source LLM observability
  • HoneyHive - AI observability and evaluation for agents
  • MLflow Tracing - Open source LLM observability

Database & Embeddings

Infrastructure & Deployment

Cloud

Package Managers

About

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Go 60.0%
  • C 32.8%
  • TypeScript 3.9%
  • C++ 1.3%
  • Objective-C 0.6%
  • Shell 0.5%
  • Other 0.9%