Semantic Graph Application

A clean, extensible application that turns unstructured text into a semantic network for exploration and retrieval. It builds a single graph with typed edges: co-occurrence edges from keyword extraction and semantic edges refined via embeddings + a lightweight GCN. The graph is persisted to Neo4j and exposed via a FastAPI backend; a React/Vite frontend provides a minimal UI for building and exploring.

Features

Typed-edge semantic graph (cooccurrence | semantic)
Multilingual keyword extraction (EN/IT, auto-detect with override)
Embeddings + GCN refinement (PyTorch + PyG)
Community detection (Louvain)
Neo4j storage (Keyword, Document, RELATED {type, weight}, IN_DOC)
API-first backend (FastAPI), minimal React/Vite frontend
Apple Silicon–friendly setup; Docker/Compose; CI; structured logging

Requirements

macOS (Apple Silicon) or Linux, Python 3.12+
Neo4j 5.x (local or Docker)
Node.js 20+ (for the frontend)

Quickstart (local, Apple Silicon)

Create environment and install backend deps (uses uv)

curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv && source .venv/bin/activate
uv sync

Start Neo4j locally and create indexes

brew install neo4j && neo4j start
# Option A: paste into Neo4j Browser (http://localhost:7474)
CREATE INDEX IF NOT EXISTS FOR (k:Keyword) ON (k.name);
CREATE INDEX IF NOT EXISTS FOR (d:Document) ON (d.id);

# Option B: run via cypher-shell (local install)
cypher-shell -a bolt://localhost:7687 -u neo4j -p password -f scripts/setup_neo4j.cypher

# Option C: with Docker Compose Neo4j
# copy the script into the container and run it via cypher-shell
# (uses NEO4J_PASSWORD env or defaults to "password")
docker compose cp scripts/setup_neo4j.cypher neo4j:/var/lib/neo4j/import/setup_neo4j.cypher
docker compose exec neo4j cypher-shell -u neo4j -p "${NEO4J_PASSWORD:-password}" -f /var/lib/neo4j/import/setup_neo4j.cypher

Run backend API

uv run uvicorn app.main:app --reload

Run frontend

cd frontend
npm install
npm run dev

Open http://localhost:3000.

Quickstart (Docker Compose)

docker compose up --build

Backend: http://localhost:8000 (OpenAPI at /docs)
Frontend: http://localhost:3000
Neo4j: Bolt bolt://localhost:7687, Browser http://localhost:7474

Configuration

Copy config.yaml.example to config.yaml and adjust as needed.

preset: fast | balanced | max_quality
embedding_model, keyword_model
top_keywords, gnn_epochs, similarity_threshold
language: auto | en | it | ...
neo4j_uri, neo4j_user, neo4j_password
community_min_size

You can override preset, top_keywords, and similarity_threshold per /build request.

How to use

From the CLI (recommended for large corpora)

Process local text files directly without going through the web API:

# Quick keyword preview (fast, ~5-10 seconds)
uv run python tykli.py build --corpus ./data/sample_corpus/ --preview

# Full pipeline test (no database save)
uv run python tykli.py build --corpus ./data/sample_corpus/ --dry-run

# Build and save to Neo4j
uv run python tykli.py build --corpus ./data/sample_corpus/

# With quality preset override
uv run python tykli.py build --corpus ./data/docs/ --preset max_quality

# With custom parameters
uv run python tykli.py build --corpus ./data/docs/ --threshold 0.65 --top-keywords 20

See CLI_USAGE.md for detailed documentation.
See MODES_COMPARISON.md for quick reference on --preview vs --dry-run.

Corpus preparation:

Create directory: mkdir -p data/my_corpus
Add .txt files (one document per file)
Run: uv run python tykli.py build --corpus ./data/my_corpus/

Results are saved to Neo4j and immediately queryable via the web UI.

From the frontend

Click "Build" to run ingestion → refinement → analysis → persistence on a small demo corpus.
Toggle edge type (all | co-occurrence | semantic) and refresh to see graph stats and sample edges.
Query documents by terms (AND mode).
Inspect Neighbors for a term; choose semantic or co-occurrence neighbors.

From the API (curl examples)

Build a graph from inline corpus (balanced preset, custom threshold)

curl -X POST http://localhost:8000/build \
  -H 'Content-Type: application/json' \
  -d '{
        "corpus": [
          "Graph neural networks refine embeddings for semantic similarity.",
          "Co-occurrence graphs capture term relationships in documents.",
          "Community detection reveals topic clusters in a knowledge graph."
        ],
        "preset": "balanced",
        "similarity_threshold": 0.6
      }'

Check job status

curl http://localhost:8000/build/<JOB_ID>/status

Get graph (semantic-only edges, limit 500)

curl "http://localhost:8000/graph?edgeType=semantic&limit=500"

Get neighbors for a term (co-occurrence)

curl "http://localhost:8000/query/neighbors/graph?edgeType=cooccurrence&limit=10"

Get documents by terms (AND mode)

curl "http://localhost:8000/query/docs?terms=graph,semantic&mode=and"

Get communities

curl http://localhost:8000/analyze/communities

Data model

Node: Keyword {name, weight, community}
Edge: RELATED {type: 'cooccurrence'|'semantic', weight: float}
Document: Document {id, content} (content truncated)
Relationships: (k)-[:IN_DOC]->(d), (k1)-[:RELATED]->(k2)

Presets & performance

fast: small model, fewer keywords, fewer epochs (quick exploration)
balanced (default): multilingual mpnet, mid keywords/epochs (quality/perf)
max_quality: all-mpnet-base-v2, more keywords/epochs (best semantics)

Targets (M4 Max): build 1k docs in < 8 min (first run downloads models), query neighbors in < 300ms.

Development

Tests

uv run pytest -q

Lint & type check

uv run ruff check .
uv run mypy app

Logs: structured JSON; stages logged (ingestion/refinement/analysis/persist)

Troubleshooting

Torch/PyG install on macOS: pins are in pyproject.toml/uv.lock. Prefer CPU; MPS may not accelerate all ops.
Neo4j auth: default user neo4j, password password (change in config.yaml).
Model download slow: ensure network; set local cache (e.g., HF_HOME).

Roadmap

Better interactive graph viz (physics layouts, subgraph focus)
Optional vector DB for hybrid search
Cloud offloading for GNN refinement
Advanced entity/phrase extraction and document preview UX

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
app		app
docs		docs
frontend		frontend
scripts		scripts
tests		tests
.gitignore		.gitignore
CLI_USAGE.md		CLI_USAGE.md
Dockerfile.api		Dockerfile.api
Dockerfile.frontend		Dockerfile.frontend
IMPROVEMENTS_LOG.md		IMPROVEMENTS_LOG.md
MODES_COMPARISON.md		MODES_COMPARISON.md
README.md		README.md
config.yaml.example		config.yaml.example
docker-compose.yml		docker-compose.yml
main.py		main.py
product_requirement.md		product_requirement.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tykli.py		tykli.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic Graph Application

Features

Requirements

Quickstart (local, Apple Silicon)

Quickstart (Docker Compose)

Configuration

How to use

From the CLI (recommended for large corpora)

From the frontend

From the API (curl examples)

Data model

Presets & performance

Development

Troubleshooting

Roadmap

About

Uh oh!

Releases

Packages

Languages

vlorenzo/tyk25

Folders and files

Latest commit

History

Repository files navigation

Semantic Graph Application

Features

Requirements

Quickstart (local, Apple Silicon)

Quickstart (Docker Compose)

Configuration

How to use

From the CLI (recommended for large corpora)

From the frontend

From the API (curl examples)

Data model

Presets & performance

Development

Troubleshooting

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages