Deleuze GraphRAG Pipeline

This repo contains the assets needed to turn the Deleuze corpus (books + seminars) into a rhizomatic GraphRAG project powered by Cohere embed-v4.0. The flow has four stages:

Corpus preparation – convert all PDFs into cleaned text files plus Cohere-ready chunks.
Prompt tuning – build a bilingual extraction prompt that respects Deleuzian ontology.
GraphRAG indexing – ingest the cleaned corpus with Cohere embeddings.
Nomad agent – route questions between the GraphRAG concept map and the quote-level vector index.

1. Prepare the corpus

Install the Python dependencies (requires pdftotext from poppler) and run the extractor:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python scripts/prepare_corpus.py \
  --source-dirs pdf_books seminars \
  --output-root deleuze_corpus \
  --chunk-tokens 1200 \
  --chunk-overlap 0.15

Artifacts:

graphrag_project/input/data/*.txt — cleaned documents for GraphRAG ingestion.
deleuze_corpus/metadata/manifest.jsonl — metadata for each document.
deleuze_corpus/chunks/chunks.jsonl — chunked payload for Cohere embeddings / vector DB.

2. Prompt tuning (auto schema discovery)

export COHERE_API_KEY=...
export HYPERBOLIC_API_KEY=...
graphrag prompt-tune \
  --root ./graphrag_project \
  --domain "Deleuzian Philosophy and Schizoanalysis" \
  --selection-method random \
  --limit 50 \
  --language English

Review the generated prompt at graphrag_project/prompts/entity_extraction.txt and keep the bilingual instructions above to ensure FR/EN alignment. Add any fixed labels (Concept, Persona, Assemblage) if the auto-tuner drifts.

3. GraphRAG ingest (with OpenAI embeddings + DeepSeek chat)

graphrag_project/settings.yaml now points to OpenAI embeddings for GraphRAG internals and DeepSeek V3 (Hyperbolic) for summaries/prompt tuning:

embeddings:
  llm:
    type: openai_embedding
    api_key: ${OPENAI_API_KEY}
    api_base: https://api.openai.com/v1
    model: text-embedding-3-large
    dimensions: 3072
models:
  default_chat_model:
    type: openai_chat
    api_key: ${HYPERBOLIC_API_KEY}
    api_base: https://api.hyperbolic.xyz/v1
    model: deepseek-ai/DeepSeek-V3
    temperature: 0.2

Ingest:

export OPENAI_API_KEY=...
export HYPERBOLIC_API_KEY=...
graphrag index --root ./graphrag_project

The pipeline writes graph + community parquet files under graphrag_project/output.

4. Build the quote-level vector store

Use the Cohere chunks JSONL to populate a vector DB (Chroma). This ensures we can cite exact passages:

export COHERE_API_KEY=...
python scripts/build_quote_index.py \
  --chunks-path deleuze_corpus/chunks/chunks.jsonl \
  --persist-dir vector_store \
  --collection deleuze_quotes \
  --batch-size 64

This will create a persistent Chroma DB at ./vector_store containing the embeddings from Cohere embed-v4.0.

5. Nomad agent

agents/nomad_agent.py wires together:

Graph queries (reading the GraphRAG community parquet exports).
Quote retrieval (vector DB powered by Cohere embed-v4.0).

Run:

COHERE_API_KEY=... \
OPENAI_API_KEY=... \
python agents/nomad_agent.py --question "How do the wasp and the orchid illustrate deterritorialization?"

The agent decides whether to call the GraphRAG map, the quote finder, or both.

Qdrant option for the quote index

If you prefer Qdrant instead of Chroma, set QDRANT_URL and QDRANT_API_KEY in .env and adapt the quote index builder to write/read from your Qdrant collection.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
agents		agents
deleuze_corpus		deleuze_corpus
docs		docs
graphrag_project		graphrag_project
prompts		prompts
scripts		scripts
static		static
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
HANDOFF_PROMPT.md		HANDOFF_PROMPT.md
MISSING_TEXTS.md		MISSING_TEXTS.md
README.md		README.md
app.py		app.py
inspect_parquet.py		inspect_parquet.py
render.yaml		render.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deleuze GraphRAG Pipeline

1. Prepare the corpus

2. Prompt tuning (auto schema discovery)

3. GraphRAG ingest (with OpenAI embeddings + DeepSeek chat)

4. Build the quote-level vector store

5. Nomad agent

Qdrant option for the quote index

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deleuze GraphRAG Pipeline

1. Prepare the corpus

2. Prompt tuning (auto schema discovery)

3. GraphRAG ingest (with OpenAI embeddings + DeepSeek chat)

4. Build the quote-level vector store

5. Nomad agent

Qdrant option for the quote index

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages