A clean, extensible application that turns unstructured text into a semantic network for exploration and retrieval. It builds a single graph with typed edges: co-occurrence edges from keyword extraction and semantic edges refined via embeddings + a lightweight GCN. The graph is persisted to Neo4j and exposed via a FastAPI backend; a React/Vite frontend provides a minimal UI for building and exploring.
- Typed-edge semantic graph (cooccurrence | semantic)
- Multilingual keyword extraction (EN/IT, auto-detect with override)
- Embeddings + GCN refinement (PyTorch + PyG)
- Community detection (Louvain)
- Neo4j storage (
Keyword,Document,RELATED {type, weight},IN_DOC) - API-first backend (FastAPI), minimal React/Vite frontend
- Apple Silicon–friendly setup; Docker/Compose; CI; structured logging
- macOS (Apple Silicon) or Linux, Python 3.12+
- Neo4j 5.x (local or Docker)
- Node.js 20+ (for the frontend)
- Create environment and install backend deps (uses
uv)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv && source .venv/bin/activate
uv sync- Start Neo4j locally and create indexes
brew install neo4j && neo4j start
# Option A: paste into Neo4j Browser (http://localhost:7474)
CREATE INDEX IF NOT EXISTS FOR (k:Keyword) ON (k.name);
CREATE INDEX IF NOT EXISTS FOR (d:Document) ON (d.id);
# Option B: run via cypher-shell (local install)
cypher-shell -a bolt://localhost:7687 -u neo4j -p password -f scripts/setup_neo4j.cypher
# Option C: with Docker Compose Neo4j
# copy the script into the container and run it via cypher-shell
# (uses NEO4J_PASSWORD env or defaults to "password")
docker compose cp scripts/setup_neo4j.cypher neo4j:/var/lib/neo4j/import/setup_neo4j.cypher
docker compose exec neo4j cypher-shell -u neo4j -p "${NEO4J_PASSWORD:-password}" -f /var/lib/neo4j/import/setup_neo4j.cypher- Run backend API
uv run uvicorn app.main:app --reload- Run frontend
cd frontend
npm install
npm run devOpen http://localhost:3000.
docker compose up --build- Backend:
http://localhost:8000(OpenAPI at/docs) - Frontend:
http://localhost:3000 - Neo4j: Bolt
bolt://localhost:7687, Browserhttp://localhost:7474
Copy config.yaml.example to config.yaml and adjust as needed.
preset:fast | balanced | max_qualityembedding_model,keyword_modeltop_keywords,gnn_epochs,similarity_thresholdlanguage:auto | en | it | ...neo4j_uri,neo4j_user,neo4j_passwordcommunity_min_size
You can override preset, top_keywords, and similarity_threshold per /build request.
Process local text files directly without going through the web API:
# Quick keyword preview (fast, ~5-10 seconds)
uv run python tykli.py build --corpus ./data/sample_corpus/ --preview
# Full pipeline test (no database save)
uv run python tykli.py build --corpus ./data/sample_corpus/ --dry-run
# Build and save to Neo4j
uv run python tykli.py build --corpus ./data/sample_corpus/
# With quality preset override
uv run python tykli.py build --corpus ./data/docs/ --preset max_quality
# With custom parameters
uv run python tykli.py build --corpus ./data/docs/ --threshold 0.65 --top-keywords 20See CLI_USAGE.md for detailed documentation.
See MODES_COMPARISON.md for quick reference on --preview vs --dry-run.
Corpus preparation:
- Create directory:
mkdir -p data/my_corpus - Add
.txtfiles (one document per file) - Run:
uv run python tykli.py build --corpus ./data/my_corpus/
Results are saved to Neo4j and immediately queryable via the web UI.
- Click "Build" to run ingestion → refinement → analysis → persistence on a small demo corpus.
- Toggle edge type (all | co-occurrence | semantic) and refresh to see graph stats and sample edges.
- Query documents by terms (AND mode).
- Inspect Neighbors for a term; choose semantic or co-occurrence neighbors.
- Build a graph from inline corpus (balanced preset, custom threshold)
curl -X POST http://localhost:8000/build \
-H 'Content-Type: application/json' \
-d '{
"corpus": [
"Graph neural networks refine embeddings for semantic similarity.",
"Co-occurrence graphs capture term relationships in documents.",
"Community detection reveals topic clusters in a knowledge graph."
],
"preset": "balanced",
"similarity_threshold": 0.6
}'- Check job status
curl http://localhost:8000/build/<JOB_ID>/status- Get graph (semantic-only edges, limit 500)
curl "http://localhost:8000/graph?edgeType=semantic&limit=500"- Get neighbors for a term (co-occurrence)
curl "http://localhost:8000/query/neighbors/graph?edgeType=cooccurrence&limit=10"- Get documents by terms (AND mode)
curl "http://localhost:8000/query/docs?terms=graph,semantic&mode=and"- Get communities
curl http://localhost:8000/analyze/communities- Node:
Keyword {name, weight, community} - Edge:
RELATED {type: 'cooccurrence'|'semantic', weight: float} - Document:
Document {id, content}(content truncated) - Relationships:
(k)-[:IN_DOC]->(d),(k1)-[:RELATED]->(k2)
fast: small model, fewer keywords, fewer epochs (quick exploration)balanced(default): multilingual mpnet, mid keywords/epochs (quality/perf)max_quality: all-mpnet-base-v2, more keywords/epochs (best semantics)
Targets (M4 Max): build 1k docs in < 8 min (first run downloads models), query neighbors in < 300ms.
- Tests
uv run pytest -q- Lint & type check
uv run ruff check .
uv run mypy app- Logs: structured JSON; stages logged (ingestion/refinement/analysis/persist)
- Torch/PyG install on macOS: pins are in
pyproject.toml/uv.lock. Prefer CPU; MPS may not accelerate all ops. - Neo4j auth: default user
neo4j, passwordpassword(change inconfig.yaml). - Model download slow: ensure network; set local cache (e.g.,
HF_HOME).
- Better interactive graph viz (physics layouts, subgraph focus)
- Optional vector DB for hybrid search
- Cloud offloading for GNN refinement
- Advanced entity/phrase extraction and document preview UX