Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 8 additions & 26 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,17 @@
# .env is gitignored for security

# ==============================================================================
# NEO4J CONFIGURATION
# GRAFEO CONFIGURATION (Embedded Graph Database)
# ==============================================================================
# Neo4j connection (shared by graph_manager and archimate_manager)
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=
NEO4J_PASSWORD=
NEO4J_DATABASE=neo4j
NEO4J_ENCRYPTED=false

# Neo4j connection pool settings
NEO4J_MAX_CONNECTION_LIFETIME=3600
NEO4J_MAX_CONNECTION_POOL_SIZE=50
NEO4J_CONNECTION_ACQUISITION_TIMEOUT=60

# Neo4j logging
NEO4J_LOG_LEVEL=INFO
NEO4J_LOG_QUERIES=false
# Suppress verbose Neo4j notifications (DEPRECATION, UNRECOGNIZED, HINT)
NEO4J_SUPPRESS_NOTIFICATIONS=true

# Manager namespaces (Neo4j label prefixes)
NEO4J_GRAPH_NAMESPACE=Graph
# Leave GRAFEO_DB_PATH empty for in-memory (default), or set a path for persistent storage
GRAFEO_DB_PATH=
GRAFEO_LOG_QUERIES=false

# ==============================================================================
# ARCHIMATE MANAGER CONFIGURATION
# ==============================================================================
# ArchiMate manager namespace (fallback: NEO4J_NAMESPACE_ARCHIMATE)
# Graph namespace (label prefix for extraction data)
GRAPH_NAMESPACE=Graph

# ArchiMate namespace (label prefix for model data)
ARCHIMATE_NAMESPACE=Model
NEO4J_NAMESPACE_ARCHIMATE=Model

# Validation settings
ARCHIMATE_VALIDATION_STRICT_MODE=false
Expand Down
31 changes: 30 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: CI

on:
push:
branches: [main, dev]
branches: [main, dev, "release/*"]
pull_request:
branches: [main, dev]

Expand Down Expand Up @@ -34,6 +34,14 @@ jobs:
- name: Run Ruff formatter check
run: uv run ruff format --check deriva/

typos:
name: Typos
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check for typos
uses: crate-ci/typos@v1.29.4

typecheck:
name: Type Check
runs-on: ubuntu-latest
Expand Down Expand Up @@ -85,3 +93,24 @@ jobs:
fail_ci_if_error: false
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}

security:
name: Security Audit
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v4
with:
version: "latest"
enable-cache: true

- name: Set up Python
run: uv python install ${{ env.PYTHON_VERSION }}

- name: Install dependencies
run: uv sync --extra dev

- name: Audit dependencies
run: uv run pip-audit --ignore-vuln CVE-2025-69872 # diskcache pickle CVE; mitigated via JSONDisk
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
*__pycache__*
temp_*
tmp_*
.cl*/*
.env
workspace/*
deriva/adapters/neo4j/data/*
Expand Down
47 changes: 43 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,46 @@
# Pre-commit hooks for Deriva
# Install: cargo install prek && prek install
# Run manually: prek run --all-files
# Docs: https://github.com/j178/prek

repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.8.6
# Built-in prek hooks
- repo: builtin
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- id: trailing-whitespace
exclude: '\.md$'
- id: end-of-file-fixer
exclude: '\.md$'
- id: check-yaml
- id: check-toml
- id: check-merge-conflict
- id: check-added-large-files
args: ['--maxkb=1000']

# Typo checker
- repo: https://github.com/crate-ci/typos
rev: v1.29.4
hooks:
- id: typos

# Python hooks
- repo: local
hooks:
- id: ruff-check
name: ruff check
entry: uv run ruff check --fix --exit-non-zero-on-fix
language: system
types: [python]

- id: ruff-format
name: ruff format
entry: uv run ruff format
language: system
types: [python]

- id: ty-check
name: ty check
entry: uv run ty check deriva/
language: system
pass_filenames: false
types: [python]
12 changes: 6 additions & 6 deletions ARCHITECTURE.MD
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ Repository --> Extraction --> Graph --> Derivation --> ArchiMate Model --> Expor
| +-------------------------------+ +-----------------------------------+ |
| | External System Adapters | | Business Logic Modules | |
| | +---------+ +-------------+ | | +-------------+ +-------------+ | |
| | | Neo4j | | Database | | | | Extraction | | Derivation | | |
| | | (neo4j)| | (database) | | | | | | | | |
| | | Grafeo | | Database | | | | Extraction | | Derivation | | |
| | | (grafeo)| | (database) | | | | | | | | |
| | +---------+ +-------------+ | | | - Business | | - Prep | | |
| | +---------+ +-------------+ | | | - TypeDef | | - Generate | | |
| | | Graph | | ArchiMate | | | | - Method | | - Refine | | |
Expand Down Expand Up @@ -101,7 +101,7 @@ External system integrations:

| Adapter | Purpose |
|---------|---------|
| `neo4j/` | Shared Neo4j connection with namespace isolation |
| `grafeo/` | Embedded graph database with namespace isolation |
| `database/` | DuckDB for configuration and metadata storage |
| `graph/` | Graph operations (nodes, edges) in "Graph" namespace |
| `archimate/` | ArchiMate model operations in "Model" namespace |
Expand Down Expand Up @@ -234,16 +234,16 @@ Each layer has a `ruff.toml` file enforcing boundaries:

| Store | Purpose | Location |
|-------|---------|----------|
| Neo4j (Graph) | Intermediate representation | `bolt://localhost:7687` |
| Neo4j (Model) | ArchiMate elements/relationships | `bolt://localhost:7687` |
| Grafeo (Graph) | Intermediate representation | Embedded (in-memory or `GRAFEO_DB_PATH`) |
| Grafeo (Model) | ArchiMate elements/relationships | Embedded (shared instance) |
| DuckDB | Configuration, metadata | `deriva/adapters/database/sql.db` |
| Workspace | Repositories, benchmarks, exports | `workspace/` |

## Key Design Decisions

1. **Services as API Layer**: CLI and App only interact with the backend through `PipelineSession`, ensuring consistent behavior and single point of change.

2. **Namespace Isolation**: Neo4j uses label prefixes (`Graph:`, `Model:`) to separate extraction data from ArchiMate model data in a single database.
2. **Namespace Isolation**: Grafeo uses label prefixes (`Graph:`, `Model:`) to separate extraction data from ArchiMate model data in a single embedded database.

3. **Configuration Versioning**: All config changes create new versions, enabling rollback and A/B testing during optimization.

Expand Down
17 changes: 15 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,22 @@ Deriving ArchiMate models from code using knowledge graphs, heuristics, and LLMs

Version 0.7.x is all about stability, portability, user experience, documentation and clean architecture/code standards.

## v0.7.0 - (Unreleased)
## v0.7.0 - Grafeo Migration (March 1, 2026)

TBD
Replaced Neo4j (Docker container) with grafeo, an embedded Rust graph database. Removes the external Docker dependency entirely and the graph database now runs in-process. (500-1000x speedup yah!)

### Infrastructure

- **Grafeo adapter**: New `deriva/adapters/grafeo/` with `GrafeoConnection`, a drop-in replacement for the old `Neo4jConnection`, using a shared `GrafeoDB` singleton with namespace isolation
- **No Docker required**: Graph database is embedded (in-memory by default, persistent via `GRAFEO_DB_PATH` env var)
- **Removed Neo4j**: Deleted `deriva/adapters/neo4j/` and all Neo4j driver dependencies

### Breaking Changes

- `Neo4jSettings` → `GrafeoSettings` (env prefix: `GRAFEO_`)
- `NEO4J_GRAPH_NAMESPACE` → `GRAPH_NAMESPACE`, `NEO4J_NAMESPACE_ARCHIMATE` → `ARCHIMATE_NAMESPACE`
- `session.start_neo4j()` / `stop_neo4j()` → `start_graph_db()` / `stop_graph_db()`
- `get_enrichments_from_neo4j()` → `get_enrichments_from_graph()`

---

Expand Down
27 changes: 13 additions & 14 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ cd Deriva

# Copy environment template
cp .env.example .env
# Edit .env with your configuration (Neo4j, LLM keys, etc.)
# Edit .env with your configuration (LLM keys, etc.)

# Install with dev dependencies
uv sync --all-extras
Expand Down Expand Up @@ -92,7 +92,7 @@ deriva/
├── adapters/ (Stateful I/O services)
│ ├── database/ - DuckDB configuration storage
│ ├── neo4j/ - Neo4j connection pool
│ ├── grafeo/ - Embedded graph database (grafeo)
│ ├── repository/ - Git operations
│ ├── graph/ - Graph CRUD (namespace: Graph)
│ ├── archimate/ - ArchiMate CRUD (namespace: Model)
Expand Down Expand Up @@ -213,7 +213,7 @@ Marimo: displays results in UI | CLI: prints summary to stdout
| Column | Purpose |
|--------|---------|
| Column 0 | Run Deriva (pipeline buttons, status callouts) |
| Column 1 | Configuration (runs, repos, Neo4j, graph stats, ArchiMate, LLM) |
| Column 1 | Configuration (runs, repos, graph database, graph stats, ArchiMate, LLM) |
| Column 2 | Extraction Settings (file types, extraction step config) |
| Column 3 | Derivation Settings (13 element types across Business/Application/Technology layers) |

Expand Down Expand Up @@ -422,11 +422,10 @@ What does this class represent? Why does it exist?
```python
class GraphManager:
"""
High-level interface for Neo4j graph operations.
High-level interface for graph database operations.

Wraps Neo4j driver complexity and provides domain-specific operations
like "add repository" rather than raw Cypher. Uses connection pooling
internally so you can safely create instances per-request.
Wraps grafeo complexity and provides domain-specific operations
like "add repository" rather than raw Cypher.
"""
```

Expand Down Expand Up @@ -475,7 +474,7 @@ Comments explain **why**, not **what**.
if raw.startswith(b'\xff\xfe'):
return raw.decode('utf-16-le')

# Neo4j MERGE needs deterministic IDs to avoid duplicates
# MERGE needs deterministic IDs to avoid duplicates
node_id = f"Repository_{repo_name}"

# LLMs sometimes return markdown-wrapped JSON
Expand Down Expand Up @@ -540,7 +539,7 @@ def read_file(path: Path) -> str | None:
def connect(self) -> None:
"""Raises ConnectionError if connection fails."""
try:
self._driver = neo4j.GraphDatabase.driver(self._uri)
self._db = get_database()
except Exception as e:
raise ConnectionError(f"Failed to connect: {e}") from e
```
Expand Down Expand Up @@ -670,7 +669,7 @@ name = 'Deriva'

### Dependencies

- Can import **infrastructure adapters** (e.g., GraphManager ← Neo4jConnection)
- Can import **infrastructure adapters** (e.g., GraphManager ← GrafeoConnection)
- **Cannot** import other domain adapters (e.g., GraphManager ✗← ArchimateManager)
- **Cannot** import modules

Expand Down Expand Up @@ -1341,7 +1340,7 @@ Deriva splits configuration by **ownership** - who needs to change it and why:

### Environment Variables

- Naming: `{MANAGER}_{CATEGORY}_{SETTING}` (e.g., `NEO4J_POOL_SIZE`)
- Naming: `{MANAGER}_{CATEGORY}_{SETTING}` (e.g., `GRAFEO_DB_PATH`)
- Provide **sensible defaults** in code if env var missing
- Comma-separated for lists (e.g., `ARCHIMATE_ELEMENT_TYPES=Component,Service`)
- Boolean as string: `true`/`false` (case-insensitive)
Expand Down Expand Up @@ -1764,7 +1763,7 @@ def _(session, mo):
| `run_derivation(...)` | Run derivation pipeline |
| `run_pipeline(...)` | Run full pipeline |
| `export_model(path, name)` | Export ArchiMate XML |
| `start_neo4j()` / `stop_neo4j()` | Container control |
| `start_graph_db()` / `stop_graph_db()` | Graph database control |
| `clear_graph()` / `clear_model()` | Clear data |

</details>
Expand Down Expand Up @@ -1853,7 +1852,7 @@ def test_add_and_get_node():
assert retrieved['id'] == 'test1'
```

Adapter tests may require external services (Neo4j, DuckDB) - use fixtures for setup/teardown.
Adapter tests may require external services (DuckDB) - use fixtures for setup/teardown.

---

Expand Down Expand Up @@ -1892,7 +1891,7 @@ This project includes several specialized documentation files:
| LLM Adapter | [deriva/adapters/llm/README.md](deriva/adapters/llm/README.md) |
| Graph Adapter | [deriva/adapters/graph/README.md](deriva/adapters/graph/README.md) |
| Database Adapter | [deriva/adapters/database/README.md](deriva/adapters/database/README.md) |
| Neo4j Adapter | [deriva/adapters/neo4j/README.md](deriva/adapters/neo4j/README.md) |
| Grafeo Adapter | [deriva/adapters/grafeo/README.md](deriva/adapters/grafeo/README.md) |
| ArchiMate Adapter | [deriva/adapters/archimate/README.md](deriva/adapters/archimate/README.md) |
| Repository Adapter | [deriva/adapters/repository/README.md](deriva/adapters/repository/README.md) |
| Marimo App | [deriva/app/README.md](deriva/app/README.md) |
Expand Down
2 changes: 1 addition & 1 deletion OPTIMIZATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,7 @@ enrichments = enrich.enrich_graph(
edges=edges,
algorithms=['pagerank', 'louvain', 'kcore', 'articulation_points', 'degree']
)
# Write to Neo4j: graph_manager.batch_update_properties(enrichments)
# Write to graph: graph_manager.batch_update_properties(enrichments)
```

</details>
Expand Down
Loading
Loading