Skip to content

iamthebrainman/UnForkRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UnForkRAG

UnForkRAG is a small, opinionated Retrieval-Augmented Generation (RAG) toolkit designed to handle both code and prose content using a unified interface. It provides:

  • Fast content indexing using a 24-bit GDA(Geometric Deterministic Addressing) hash scheme (see gda_hash.py).
  • Simple document storage and searching via a Client abstraction.
  • Multiple API compatibility layers (REST, Chroma, Qdrant, Ollama-like endpoints).

This repository contains a lightweight server exposing a number of API endpoints (see API below) and a simple approach to generate pseudo-embeddings using GDA hashes.


Quickstart ✅

Requirements: Python 3.8+ (tested on Windows PowerShell 5.1)

  1. Create a virtual environment and install dependencies (if any external deps are added later):
python -m venv .venv; .\.venv\Scripts\Activate.ps1
  1. Run the server using the provided launcher script:
python run_server.py --host 127.0.0.1 --port 8000

Or run directly with the module:

python -m unforkrag.unfork_server --port 8000

Visit http://127.0.0.1:8000/ to confirm the server is running.


One-line Launcher (run_server.py) 🔧

There is a small convenience script run_server.py at the project root that invokes the main server function. It forwards common args (host, port, debug) to the server.

Example:

python run_server.py --port 8000 --debug

Project Layout 📁

  • unforkrag/ - main package
    • unfork_server.py - Flask server exposing multiple API families
    • gda_hash.py - GDA 24-bit hashing + position index
    • demo.py - example usage and quick demo
    • other modules: paraphrase.py, synsets.py, inferred_relations.py, etc.
  • training/ - text files and demo index scripts

API Reference 🔍

This server exposes multiple API families. The most commonly used (Ollama-compatible) endpoints are under /api/*:

  • GET /api/version
    Returns a JSON version object. Example:
{ "version": "0.2.0" }
  • GET /api/ps
    Returns a list of 'models' (in this project those map to collections). Example response:
{ "models": [{ "name": "default", "model": "default", "size": 4096 } ] }
  • POST /api/generate
    Core text generation endpoint. Request JSON fields:
    • prompt: string
    • model (optional): collection/model name (defaults to unfork -> uses default collection)

Example request (curl):

curl -sS -X POST http://localhost:8000/api/generate -H "Content-Type: application/json" \
  -d '{"prompt": "What is GDA hashing?", "model": "default"}'

Typical response contains response, token counts, and timings placeholder fields.

  • POST /api/chat
    Chat-style completions with message history. Expects messages (array) in the Ollama/chat format:
{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain the GDA embedding method."}
  ]
}
  • POST /api/embeddings
    Generates embeddings. Both single and batch requests are supported. Input param: prompt (or input). This implementation returns pseudo-embeddings computed by hashing tokens with the GDA 24-bit hash and mapping them into a 768-dimensional vector with values normalized to [-1, 1].

Example (single):

curl -X POST http://localhost:8000/api/embeddings -H "Content-Type: application/json" -d '{"input": "Hello world"}'

Example (batch):

curl -X POST http://localhost:8000/api/embeddings -H "Content-Type: application/json" -d '{"input": ["Hello world", "Second text"]}'

Observability / Request Tracing 🔭

For easier verification of generated content and to inspect requests/responses, the server keeps a bounded in-memory observation log. This can be useful for development and QA.

  • GET /api/observe
    Returns recent requests and the server's responses (most recent first). Query param limit controls how many entries to return (default 20).

Example:

curl -sS "http://localhost:8000/api/observe?limit=5" | jq
  • GET /api/observe/<id>
    Retrieve a specific observation by id (an id is included on each returned observation).
curl -sS http://localhost:8000/api/observe/123e4567-e89b-12d3-a456-426614174000 | jq

Configuration:

  • UNFORK_OBSERVE_MAX (env var, default 200) — controls how many observations are stored in memory.

Security note: The observation log is stored in memory and intended for local development and testing. Do not enable it in untrusted public deployments or log sensitive data.


Admin UI (browser) 🧭

For convenience there's a simple single-file Admin UI available at /admin (enabled by default). It displays recent observations, lets you click to view full request/response JSON, and supports auto-refresh.

Controls:

  • limit — number of observations to fetch from the server
  • Refresh — manual refresh
  • Auto — toggles auto-refresh (3s interval)

Example: Open in your browser:

http://127.0.0.1:8000/admin

Security note: The Admin UI is intended for local development and QA only. Disable it in production by setting UNFORK_ENABLE_ADMIN=0 in the environment.

The admin UI also supports a live viewer using Server-Sent Events (SSE). Open the UI and click Start Viewer to receive new observations in real time. The SSE endpoint is available at:

/api/observe/stream

This viewer is intended for local development and quick verification; it connects via SSE and will append new observations to the top of the list as they're logged.


Other compatibility endpoints available:

  • /chroma/api/v1/* - ChromaDB-like API for collection management and queries
  • /qdrant/* - Qdrant-like endpoints for collection/point upserts and searches
  • /ollama/api/* - older Ollama-style embedding endpoints and tags

Refer to unfork_server.py for detailed behavior and all supported paths.


Design Notes 💡

  • Embeddings are pseudo-embeddings: they are deterministic, fast to compute, and designed to be helpful for simple similarity tasks in demos and prototyping. They are not a replacement for model-based embeddings for production semantic search.
  • Tokenization is performed with tokenize_universal which aims to produce stable tokens for both code and prose.

Development & Contributing ✍️

  • Run the server locally and exercise endpoints via curl, PowerShell Invoke-RestMethod, Postman, or your Python client.
  • To add support for a real model back-end: implement a generator that uses a model's API and replace the simple search-based generation in /api/generate and /api/chat.

If you'd like, open a PR with fixes or feature requests — keep changes small and well-tested.


Troubleshooting

  • If endpoints return 404 for collections, ensure you added documents using /api/add or the Chroma/Qdrant endpoints.
  • Server logs are printed to stdout; running with --debug enables Flask debug mode for easier diagnostics.

License

This project uses an MIT-style license. Update LICENSE as appropriate for your project.

About

When RAG is too forking slow. Vector‑free. Drop‑in. Edge‑compatible.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages