UnForkRAG is a small, opinionated Retrieval-Augmented Generation (RAG) toolkit designed to handle both code and prose content using a unified interface. It provides:
- Fast content indexing using a 24-bit GDA(Geometric Deterministic Addressing) hash scheme (see
gda_hash.py). - Simple document storage and searching via a
Clientabstraction. - Multiple API compatibility layers (REST, Chroma, Qdrant, Ollama-like endpoints).
This repository contains a lightweight server exposing a number of API endpoints (see API below) and a simple approach to generate pseudo-embeddings using GDA hashes.
Requirements: Python 3.8+ (tested on Windows PowerShell 5.1)
- Create a virtual environment and install dependencies (if any external deps are added later):
python -m venv .venv; .\.venv\Scripts\Activate.ps1- Run the server using the provided launcher script:
python run_server.py --host 127.0.0.1 --port 8000Or run directly with the module:
python -m unforkrag.unfork_server --port 8000Visit http://127.0.0.1:8000/ to confirm the server is running.
There is a small convenience script run_server.py at the project root that invokes the main server function. It forwards common args (host, port, debug) to the server.
Example:
python run_server.py --port 8000 --debugunforkrag/- main packageunfork_server.py- Flask server exposing multiple API familiesgda_hash.py- GDA 24-bit hashing + position indexdemo.py- example usage and quick demo- other modules:
paraphrase.py,synsets.py,inferred_relations.py, etc.
training/- text files and demo index scripts
This server exposes multiple API families. The most commonly used (Ollama-compatible) endpoints are under /api/*:
- GET
/api/version
Returns a JSON version object. Example:
{ "version": "0.2.0" }- GET
/api/ps
Returns a list of 'models' (in this project those map to collections). Example response:
{ "models": [{ "name": "default", "model": "default", "size": 4096 } ] }- POST
/api/generate
Core text generation endpoint. Request JSON fields:prompt: stringmodel(optional): collection/model name (defaults tounfork-> usesdefaultcollection)
Example request (curl):
curl -sS -X POST http://localhost:8000/api/generate -H "Content-Type: application/json" \
-d '{"prompt": "What is GDA hashing?", "model": "default"}'Typical response contains response, token counts, and timings placeholder fields.
- POST
/api/chat
Chat-style completions with message history. Expectsmessages(array) in the Ollama/chat format:
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the GDA embedding method."}
]
}- POST
/api/embeddings
Generates embeddings. Both single and batch requests are supported. Input param:prompt(orinput). This implementation returns pseudo-embeddings computed by hashing tokens with the GDA 24-bit hash and mapping them into a 768-dimensional vector with values normalized to [-1, 1].
Example (single):
curl -X POST http://localhost:8000/api/embeddings -H "Content-Type: application/json" -d '{"input": "Hello world"}'Example (batch):
curl -X POST http://localhost:8000/api/embeddings -H "Content-Type: application/json" -d '{"input": ["Hello world", "Second text"]}'For easier verification of generated content and to inspect requests/responses, the server keeps a bounded in-memory observation log. This can be useful for development and QA.
- GET
/api/observe
Returns recent requests and the server's responses (most recent first). Query paramlimitcontrols how many entries to return (default 20).
Example:
curl -sS "http://localhost:8000/api/observe?limit=5" | jq- GET
/api/observe/<id>
Retrieve a specific observation by id (anidis included on each returned observation).
curl -sS http://localhost:8000/api/observe/123e4567-e89b-12d3-a456-426614174000 | jqConfiguration:
UNFORK_OBSERVE_MAX(env var, default 200) — controls how many observations are stored in memory.
Security note: The observation log is stored in memory and intended for local development and testing. Do not enable it in untrusted public deployments or log sensitive data.
For convenience there's a simple single-file Admin UI available at /admin (enabled by default). It displays recent observations, lets you click to view full request/response JSON, and supports auto-refresh.
Controls:
limit— number of observations to fetch from the serverRefresh— manual refreshAuto— toggles auto-refresh (3s interval)
Example: Open in your browser:
http://127.0.0.1:8000/admin
Security note: The Admin UI is intended for local development and QA only. Disable it in production by setting UNFORK_ENABLE_ADMIN=0 in the environment.
The admin UI also supports a live viewer using Server-Sent Events (SSE). Open the UI and click Start Viewer to receive new observations in real time. The SSE endpoint is available at:
/api/observe/stream
This viewer is intended for local development and quick verification; it connects via SSE and will append new observations to the top of the list as they're logged.
Other compatibility endpoints available:
/chroma/api/v1/*- ChromaDB-like API for collection management and queries/qdrant/*- Qdrant-like endpoints for collection/point upserts and searches/ollama/api/*- older Ollama-style embedding endpoints and tags
Refer to unfork_server.py for detailed behavior and all supported paths.
- Embeddings are pseudo-embeddings: they are deterministic, fast to compute, and designed to be helpful for simple similarity tasks in demos and prototyping. They are not a replacement for model-based embeddings for production semantic search.
- Tokenization is performed with
tokenize_universalwhich aims to produce stable tokens for both code and prose.
- Run the server locally and exercise endpoints via
curl, PowerShellInvoke-RestMethod, Postman, or your Python client. - To add support for a real model back-end: implement a generator that uses a model's API and replace the simple search-based generation in
/api/generateand/api/chat.
If you'd like, open a PR with fixes or feature requests — keep changes small and well-tested.
- If endpoints return 404 for collections, ensure you added documents using
/api/addor the Chroma/Qdrant endpoints. - Server logs are printed to stdout; running with
--debugenables Flask debug mode for easier diagnostics.
This project uses an MIT-style license. Update LICENSE as appropriate for your project.