Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,30 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.4.3] - 2026-01-25

### Added
- **SchemaSanitizer for TigerGraph**: Added comprehensive schema sanitization for TigerGraph compatibility
- `SchemaSanitizer` class in `graflo.hq.sanitizer` module for sanitizing schema attributes
- Sanitizes vertex names and field names to avoid reserved words (appends `_vertex` suffix for vertex names, `_attr` for attributes)
- Sanitizes edge relation names to avoid reserved words and collisions with vertex names (appends `_relation` suffix)
- Normalizes vertex indexes for TigerGraph: ensures edges with the same relation have consistent source and target indexes
- Automatically applies field index mappings to resources when indexes are normalized
- Handles field name transformations in TransformActor instances to maintain data consistency
- **Vertex `dbname` field**: Added `dbname` field to `Vertex` class for database-specific vertex name mapping
- Allows specifying a different database name than the logical vertex name
- Used by SchemaSanitizer to store sanitized vertex names for TigerGraph compatibility
- **Edge `relation_dbname` property**: Added `relation_dbname` property to `Edge` class for database-specific relation name mapping
- Returns sanitized relation name if set, otherwise falls back to `relation` field
- Used by SchemaSanitizer to store sanitized relation names for TigerGraph compatibility
- Supports setter for updating the database-specific relation name
- **GraphEngine orchestrator**: Added `GraphEngine` class as the main orchestrator for graph database operations
- Coordinates schema inference, pattern creation, and data ingestion workflows
- Provides unified interface: `infer_schema()`, `create_patterns()`, and `ingest()` methods
- Integrates `InferenceManager`, `ResourceMapper`, and `Caster` components
- Supports target database flavor configuration for schema sanitization
- Located in `graflo.hq.graph_engine` module

## [1.4.0] - 2026-01-15

### Removed
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ patterns.add_file_pattern(

schema.fetch_resource()

from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

caster = Caster(schema)

Expand All @@ -143,7 +143,7 @@ caster.ingest(

```python
from graflo.db.postgres import PostgresConnection
from graflo.db.inferencer import infer_schema_from_postgres
from graflo.hq import GraphEngine
from graflo.db.connection.onto import PostgresConfig
from graflo import Caster
from graflo.onto import DBFlavor
Expand All @@ -152,11 +152,11 @@ from graflo.onto import DBFlavor
postgres_config = PostgresConfig.from_docker_env() # or PostgresConfig.from_env()
postgres_conn = PostgresConnection(postgres_config)

# Infer schema from PostgreSQL 3NF database
schema = infer_schema_from_postgres(
# Create GraphEngine and infer schema from PostgreSQL 3NF database
engine = GraphEngine(target_db_flavor=DBFlavor.ARANGO)
schema = engine.infer_schema(
postgres_conn,
schema_name="public", # PostgreSQL schema name
db_flavor=DBFlavor.ARANGO # Target graph database flavor
)

# Close PostgreSQL connection
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/example-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ patterns.add_file_pattern(
# }
# )

from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

caster = Caster(schema)

Expand Down
2 changes: 1 addition & 1 deletion docs/examples/example-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ patterns.add_file_pattern(
FilePattern(regex="\Sjson$", sub_path=pathlib.Path("."), resource_name="work")
)

from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

ingestion_params = IngestionParams(
clean_start=True, # Wipe existing database before ingestion
Expand Down
2 changes: 1 addition & 1 deletion docs/examples/example-3.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ patterns.add_file_pattern(
FilePattern(regex="^relations.*\.csv$", sub_path=pathlib.Path("."), resource_name="people")
)

from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

caster = Caster(schema)

Expand Down
2 changes: 1 addition & 1 deletion docs/examples/example-4.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ patterns.add_file_pattern(
FilePattern(regex=r"^bugs.*\.json(?:\.gz)?$", sub_path=pathlib.Path("./data"), resource_name="bug")
)

from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

caster = Caster(schema)

Expand Down
42 changes: 24 additions & 18 deletions docs/examples/example-5.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ The example uses a PostgreSQL database with a typical 3NF (Third Normal Form) sc

## Automatic Schema Inference

The `infer_schema_from_postgres()` function automatically analyzes your PostgreSQL database and creates a complete graflo Schema. This process involves several sophisticated steps:
The `GraphEngine.infer_schema_from_postgres()` method automatically analyzes your PostgreSQL database and creates a complete graflo Schema. This process involves several sophisticated steps:

### How Schema Inference Works

Expand Down Expand Up @@ -249,7 +249,7 @@ Make sure the corresponding database container is running before starting ingest

Automatically generate a graflo Schema from your PostgreSQL database. This is the core of the automatic inference process:

**What `infer_schema_from_postgres()` does:**
**What `GraphEngine.infer_schema_from_postgres()` does:**

1. **Queries PostgreSQL Information Schema**: The function queries PostgreSQL's information schema to discover all tables in the specified schema. It retrieves column information (names, types, constraints), identifies primary keys and foreign keys, and understands table relationships.

Expand All @@ -263,7 +263,7 @@ Automatically generate a graflo Schema from your PostgreSQL database. This is th

```python

from graflo.db.inferencer import infer_schema_from_postgres
from graflo.hq import GraphEngine
from graflo.onto import DBFlavor
from graflo.db.connection.onto import ArangoConfig, Neo4jConfig, TigergraphConfig, FalkordbConfig
from graflo.db import DBType
Expand All @@ -280,11 +280,11 @@ db_flavor = (
else DBFlavor.ARANGO
)

# Infer schema automatically
schema = infer_schema_from_postgres(
# Create GraphEngine and infer schema automatically
engine = GraphEngine(target_db_flavor=db_flavor)
schema = engine.infer_schema(
postgres_conn,
schema_name="public", # PostgreSQL schema name
db_flavor=db_flavor # Target graph database flavor
)
```

Expand Down Expand Up @@ -329,11 +329,14 @@ Create `Patterns` that map PostgreSQL tables to resources:

```python

from graflo.db.inferencer import create_patterns_from_postgres
from graflo.hq import GraphEngine

# Create GraphEngine instance
engine = GraphEngine()

# Create patterns from PostgreSQL tables
patterns = create_patterns_from_postgres(
postgres_conn,
patterns = engine.create_patterns(
postgres_conf,
schema_name="public"
)
```
Expand Down Expand Up @@ -374,15 +377,15 @@ from graflo import Caster
caster = Caster(schema)

# Ingest data from PostgreSQL into graph database
from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

ingestion_params = IngestionParams(
clean_start=True, # Clear existing data first
)

caster.ingest(
output_config=target_config, # Target graph database config
patterns=patterns, # PostgreSQL table patterns
patterns=patterns, # PostgreSQL table patterns
ingestion_params=ingestion_params,
)

Expand All @@ -405,7 +408,7 @@ from graflo.db import DBType
from graflo.db.postgres import (
PostgresConnection,
)
from graflo.db.inferencer import infer_schema_from_postgres, create_patterns_from_postgres
from graflo.hq import GraphEngine
from graflo.db.connection.onto import ArangoConfig, PostgresConfig

logger = logging.getLogger(__name__)
Expand All @@ -431,10 +434,11 @@ db_flavor = (
else DBFlavor.ARANGO
)

schema = infer_schema_from_postgres(
# Create GraphEngine and infer schema
engine = GraphEngine(target_db_flavor=db_flavor)
schema = engine.infer_schema(
postgres_conn,
schema_name="public",
db_flavor=db_flavor
)

# Step 5: Save inferred schema to YAML (optional)
Expand All @@ -444,10 +448,11 @@ with open(schema_output_file, "w") as f:
logger.info(f"Inferred schema saved to {schema_output_file}")

# Step 6: Create Patterns from PostgreSQL tables
patterns = create_patterns_from_postgres(postgres_conn, schema_name="public")
engine = GraphEngine()
patterns = engine.create_patterns(postgres_conf, schema_name="public")

# Step 7: Create Caster and ingest data
from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

caster = Caster(schema)

Expand Down Expand Up @@ -709,8 +714,9 @@ This pattern is particularly useful for:
After inference, you can modify the schema:

```python
# Infer schema
schema = infer_schema_from_postgres(postgres_conn, schema_name="public")
# Create GraphEngine and infer schema
engine = GraphEngine()
schema = engine.infer_schema(postgres_conn, schema_name="public")

# Modify schema as needed
# Add custom transforms, filters, or additional edges
Expand Down
4 changes: 2 additions & 2 deletions docs/examples/example-6.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ registry = DataSourceRegistry()
registry.register(api_source, resource_name="users")

# Create caster and ingest
from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

caster = Caster(schema)
# Load config from file
Expand Down Expand Up @@ -216,7 +216,7 @@ file_source = DataSourceFactory.create_file_data_source(path="users_backup.json"
registry.register(file_source, resource_name="users")

# Both will be processed and combined
from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

ingestion_params = IngestionParams() # Use default parameters

Expand Down
16 changes: 9 additions & 7 deletions docs/getting_started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ patterns = Patterns(
}
)

from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

ingestion_params = IngestionParams(
clean_start=False, # Set to True to wipe existing database
Expand Down Expand Up @@ -109,18 +109,20 @@ You can ingest data directly from PostgreSQL tables. First, infer the schema fro

```python
from graflo.db.postgres import PostgresConnection
from graflo.db.inferencer import infer_schema_from_postgres, create_patterns_from_postgres
from graflo.hq import GraphEngine
from graflo.db.connection.onto import PostgresConfig

# Connect to PostgreSQL
pg_config = PostgresConfig.from_docker_env() # Or from_env(), or create directly
pg_conn = PostgresConnection(pg_config)

# Infer schema from PostgreSQL (automatically detects vertices and edges)
schema = infer_schema_from_postgres(pg_conn, schema_name="public")
# Create GraphEngine and infer schema from PostgreSQL (automatically detects vertices and edges)
engine = GraphEngine()
schema = engine.infer_schema(pg_conn, schema_name="public")

# Create patterns from PostgreSQL tables
patterns = create_patterns_from_postgres(pg_conn, schema_name="public")
engine = GraphEngine()
patterns = engine.create_patterns(pg_config, schema_name="public")

# Or create patterns manually
from graflo.util.onto import Patterns, TablePattern
Expand All @@ -141,7 +143,7 @@ from graflo.db.connection.onto import ArangoConfig
arango_config = ArangoConfig.from_docker_env() # Target graph database
caster = Caster(schema)

from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

ingestion_params = IngestionParams(
clean_start=False, # Set to True to wipe existing database
Expand Down Expand Up @@ -187,7 +189,7 @@ registry = DataSourceRegistry()
registry.register(api_source, resource_name="users")

# Ingest
from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

caster = Caster(schema)

Expand Down
3 changes: 3 additions & 0 deletions docs/reference/architecture/onto_sql.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `graflo.architecture.onto_sql`

::: graflo.architecture.onto_sql
3 changes: 0 additions & 3 deletions docs/reference/caster.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/reference/data_source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ source = DataSourceFactory.create_sql_data_source(config)

```python
from graflo import Caster, DataSourceRegistry
from graflo.caster import IngestionParams
from graflo.hq.caster import IngestionParams

registry = DataSourceRegistry()
registry.register(file_source, resource_name="users")
Expand Down
1 change: 0 additions & 1 deletion docs/reference/db/memgraph/__init__.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# `graflo.db.memgraph`

::: graflo.db.memgraph

1 change: 0 additions & 1 deletion docs/reference/db/memgraph/conn.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# `graflo.db.memgraph.conn`

::: graflo.db.memgraph.conn

3 changes: 3 additions & 0 deletions docs/reference/db/postgres/fuzzy_matcher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `graflo.db.postgres.fuzzy_matcher`

::: graflo.db.postgres.fuzzy_matcher
3 changes: 3 additions & 0 deletions docs/reference/db/postgres/heuristics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `graflo.db.postgres.heuristics`

::: graflo.db.postgres.heuristics
3 changes: 3 additions & 0 deletions docs/reference/db/postgres/inference_utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `graflo.db.postgres.inference_utils`

::: graflo.db.postgres.inference_utils
3 changes: 3 additions & 0 deletions docs/reference/db/postgres/util.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# `graflo.db.postgres.util`

::: graflo.db.postgres.util
Loading