edraobdu · edraobdu · Jan 9, 2026 · Jan 9, 2026
diff --git a/README.md b/README.md
@@ -42,6 +42,7 @@ Extract only what you need while maintaining referential integrity.
 - ✅ **Multiple records**: Extract multiple records in one operation
 - ✅ **Timeframe filtering**: Filter specific tables by date ranges
 - ✅ **PK remapping**: Auto-remaps auto-generated primary keys for clean imports
+- ✅ **Natural key support**: Idempotent SQL generation for tables without unique constraints
 - ✅ **DDL generation**: Optionally include CREATE DATABASE/SCHEMA/TABLE statements for self-contained dumps
 - ✅ **Progress bar**: Visual progress indicator for dump operations
 - ✅ **Schema caching**: SQLite-based caching for improved performance
@@ -181,6 +182,8 @@ pgslice --host localhost --database mydb --dump users --pks 42 \
     --log-level DEBUG 2>debug.log
 ```
 
+**Transaction Safety**: All generated SQL dumps are wrapped in `BEGIN`/`COMMIT` transactions by default. If any part of the import fails, everything automatically rolls back, leaving your database unchanged.
+
 ### Schema Exploration
 
 ```bash
@@ -202,6 +205,118 @@ pgslice> tables
 pgslice> describe film
 ```
 
+## Idempotent Imports with Natural Keys
+
+### What Are Natural Keys?
+
+**Natural keys** are columns (or combinations of columns) that uniquely identify a record by its business meaning, even without explicit database constraints. They represent the "real-world" identifier for your data.
+
+Examples:
+- `roles.name` - Role names like "Admin", "User", "Guest" are naturally unique
+- `statuses.code` - Status codes like "ACTIVE", "INACTIVE", "PENDING"
+- `(tenant_id, setting_key)` - Configuration settings in multi-tenant systems
+- `countries.iso_code` - ISO country codes like "US", "CA", "UK"
+
+### Why Use Natural Keys?
+
+By default, pgslice remaps auto-generated primary keys (SERIAL, IDENTITY) to avoid conflicts when importing. However, this can create duplicate records if you reimport the same dump multiple times:
+
+```sql
+-- First import: Creates record with new id=1
+INSERT INTO roles (name) VALUES ('Admin');
+
+-- Second import: Creates duplicate with new id=2 (no UNIQUE constraint to prevent it!)
+INSERT INTO roles (name) VALUES ('Admin');
+```
+
+The `--natural-keys` flag solves this by generating **idempotent SQL** - scripts that check "does a record with this natural key already exist?" before inserting. Run the same dump multiple times safely with no duplicates.
+
+### When to Use `--natural-keys`
+
+Use this flag when:
+- ✅ Tables have auto-generated PKs (SERIAL, IDENTITY columns)
+- ✅ You need to reimport dumps multiple times (development, testing, CI/CD)
+- ✅ Tables lack explicit UNIQUE constraints on natural key columns
+- ✅ You need composite natural keys (multiple columns for uniqueness)
+- ✅ Auto-detection fails or you want explicit control
+
+### Usage Examples
+
+```bash
+# Single-column natural key (common for reference/lookup tables)
+pgslice --host localhost --database mydb --dump users --pks 42 \
+  --natural-keys "roles=name"
+
+# With schema prefix (explicit schema)
+pgslice --host localhost --database mydb --dump users --pks 42 \
+  --natural-keys "public.roles=name"
+
+# Composite natural key (multiple columns define uniqueness)
+pgslice --host localhost --database mydb --dump customers --pks 1 \
+  --natural-keys "tenant_settings=tenant_id,setting_key"
+
+# Multiple tables (semicolon-separated)
+pgslice --host localhost --database mydb --dump orders --pks 123 \
+  --natural-keys "roles=name;statuses=code;countries=iso_code"
+
+# Complex example with mixed single and composite keys
+pgslice --host localhost --database mydb --dump products --pks 456 \
+  --natural-keys "roles=name;tenant_configs=tenant_id,config_key;categories=slug"
+```
+
+**Format**: `--natural-keys "schema.table=col1,col2;other_table=col1;..."`
+- Tables separated by `;`
+- Columns separated by `,`
+- Schema prefix optional (defaults to `public`)
+
+### Auto-Detection
+
+pgslice automatically detects natural keys in this priority order:
+
+1. **Manual specification** (highest priority) - Your `--natural-keys` flag
+2. **Common column names** - Recognizes patterns like:
+   - Exact matches: `name`, `code`, `slug`, `email`, `username`, `key`, `identifier`, `handle`
+   - Suffix patterns: `*_code`, `*_key`, `*_identifier`, `*_slug`
+3. **Reference table heuristic** - Small tables (2-3 columns) with one non-PK text column
+4. **Error if none found** - Suggests using `--natural-keys` manually
+
+For most reference tables (roles, statuses, categories), auto-detection works automatically. Use manual specification for:
+- Tables with unconventional column names
+- Composite natural keys
+- When you want explicit control
+
+### How It Works
+
+When natural keys are specified, pgslice generates sophisticated CTE-based SQL that:
+
+1. Checks if records with matching natural keys already exist
+2. Only inserts records that don't exist yet
+3. Maps old primary keys to new (or existing) primary keys for foreign key resolution
+4. Ensures idempotency - running multiple times produces the same result
+
+Example generated SQL structure:
+```sql
+WITH to_insert AS (
+    -- Values to potentially insert
+    SELECT * FROM (VALUES (...)) AS v(...)
+),
+existing AS (
+    -- Find records that already exist by natural key
+    SELECT t.id, ti.old_id
+    FROM roles t
+    INNER JOIN to_insert ti ON t.name IS NOT DISTINCT FROM ti.name
+),
+inserted AS (
+    -- Insert only new records (skip existing)
+    INSERT INTO roles (name, permissions)
+    SELECT name, permissions FROM to_insert
+    WHERE old_id NOT IN (SELECT old_id FROM existing)
+    RETURNING id, name
+)
+-- Map old IDs to new IDs for FK resolution
+...
+```
+
 ## Configuration
 
 Key environment variables (see `.env.example` for full reference):

diff --git a/src/pgslice/cli.py b/src/pgslice/cli.py
@@ -81,6 +81,74 @@ def parse_main_timeframe(spec: str) -> MainTableTimeframe:
     )
 
 
+def parse_natural_keys(spec: str) -> dict[str, list[str]]:
+    """
+    Parse natural keys specification.
+
+    Format: schema.table=col1,col2;other_table=col1
+    Example: public.roles=name;public.statuses=code
+
+    Args:
+        spec: Natural keys specification string
+
+    Returns:
+        Dict mapping "schema.table" to list of column names
+
+    Raises:
+        InvalidTimeframeError: If specification is invalid
+
+    Examples:
+        >>> parse_natural_keys("public.roles=name")
+        {"public.roles": ["name"]}
+
+        >>> parse_natural_keys("public.roles=name;statuses=code")
+        {"public.roles": ["name"], "statuses": ["code"]}
+
+        >>> parse_natural_keys("roles=name,code")
+        {"roles": ["name", "code"]}
+    """
+    result: dict[str, list[str]] = {}
+
+    # Split by semicolon to get individual table specifications
+    table_specs = spec.split(";")
+
+    for table_spec in table_specs:
+        table_spec = table_spec.strip()
+        if not table_spec:
+            continue
+
+        # Split by = to get table and columns
+        if "=" not in table_spec:
+            raise InvalidTimeframeError(
+                f"Invalid natural key format: {table_spec}. "
+                "Expected: table=col1,col2 or schema.table=col1"
+            )
+
+        table_part, columns_part = table_spec.split("=", 1)
+        table_part = table_part.strip()
+        columns_part = columns_part.strip()
+
+        if not table_part or not columns_part:
+            raise InvalidTimeframeError(
+                f"Invalid natural key format: {table_spec}. "
+                "Both table and columns must be specified"
+            )
+
+        # Split columns by comma
+        columns = [col.strip() for col in columns_part.split(",")]
+        columns = [col for col in columns if col]  # Remove empty strings
+
+        if not columns:
+            raise InvalidTimeframeError(
+                f"Invalid natural key format: {table_spec}. "
+                "At least one column must be specified"
+            )
+
+        result[table_part] = columns
+
+    return result
+
+
 def fetch_pks_by_timeframe(
     conn_manager: ConnectionManager,
     table: str,
@@ -407,6 +475,15 @@ def main() -> int:
         "-o",
         help="Output file path (default: stdout)",
     )
+    dump_group.add_argument(
+        "--natural-keys",
+        help=(
+            "Manually specify natural keys for tables without unique constraints. "
+            "Format: 'schema.table=col1,col2;other_table=col1'. "
+            "Enables idempotent INSERTs for tables with auto-generated PKs. "
+            "Example: 'public.roles=name;public.statuses=code'"
+        ),
+    )
 
     # Other arguments
     parser.add_argument(
@@ -456,6 +533,14 @@ def main() -> int:
         if args.log_level:
             config.log_level = args.log_level
 
+        # Parse natural keys if provided
+        if args.natural_keys:
+            try:
+                config.natural_keys = parse_natural_keys(args.natural_keys)
+            except InvalidTimeframeError as e:
+                sys.stderr.write(f"Error: {e}\n")
+                return 1
+
         # Validate CLI dump mode arguments
         if args.dump and not args.pks and not args.timeframe:
             sys.stderr.write(

diff --git a/src/pgslice/config.py b/src/pgslice/config.py
@@ -46,6 +46,7 @@ class AppConfig:
     sql_batch_size: int = 100
     output_dir: Path = Path.home() / ".pgslice" / "dumps"
     create_schema: bool = False
+    natural_keys: dict[str, list[str]] | None = None
 
 
 def load_config() -> AppConfig:

diff --git a/src/pgslice/dumper/dump_service.py b/src/pgslice/dumper/dump_service.py
@@ -128,7 +128,9 @@ def dump(
             # Step 3: Generate SQL (using animated spinner)
             with animated_spinner(spinner, pbar.set_description, "Generating SQL"):
                 generator = SQLGenerator(
-                    introspector, batch_size=self.config.sql_batch_size
+                    introspector,
+                    batch_size=self.config.sql_batch_size,
+                    natural_keys=self.config.natural_keys,
                 )
                 sql = generator.generate_batch(
                     sorted_records,