Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Changelog

All notable changes to pgslice will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.2.0] - 2025-12-28

### Added
- **CLI-first design**: pgslice now works as a CLI tool that can dump records without entering REPL
- `--table TABLE` + `--pks PK_VALUES`: Dump specific records by primary key (comma-separated)
- `--timeframe COLUMN:START:END`: Filter main table by date range (alternative to `--pks`)
- `--truncate TABLE:COL:START:END`: Apply timeframe filters to related tables (repeatable)
- `--output FILE`: Write output to file (default: stdout for easy piping)
- `--wide`: Enable wide mode (follow self-referencing FKs)
- `--keep-pks`: Keep original primary key values instead of remapping
- `--graph`: Display table relationship graph after dump completes
- **Schema introspection commands**:
- `--tables`: List all tables in the schema with formatted output
- `--describe TABLE`: Show table structure and relationships
- **Schema DDL generation**: New `--create-schema` flag for dump command
- Generates `CREATE DATABASE IF NOT EXISTS` statements
- Generates `CREATE SCHEMA IF NOT EXISTS` for all schemas
- Generates `CREATE TABLE IF NOT EXISTS` with complete table definitions
- Includes columns, primary keys, unique constraints, and foreign keys
- Handles circular dependencies via ALTER TABLE statements
- Supports all PostgreSQL data types including arrays and user-defined types
- All DDL uses IF NOT EXISTS for idempotency (can run multiple times safely)
- Works with both `--keep-pks` and default PK remapping modes
- **Dependency graph visualization**: New `--graph` flag displays ASCII art graph of table relationships
- Shows record counts per table
- Displays FK relationships between tables
- Highlights root table(s) in the graph
- **Progress indicators**: Visual feedback for long-running operations
- Spinner animation during traversal operations
- Progress bar enabled in both CLI and REPL modes
- Automatically disabled when output is piped (not a TTY)
- **Centralized operations module**: New `pgslice.operations` package for shared CLI/REPL logic
- `operations/dump_ops.py`: Shared dump execution logic
- `operations/parsing.py`: Timeframe/truncate filter parsing utilities
- `operations/schema_ops.py`: List tables and describe table operations

### Changed
- **REPL mode improvements**:
- Renamed `--timeframe` flag to `--truncate` for clarity (applies to related tables, not main table)
- Enabled progress bar in REPL mode for better user feedback
- Updated to use centralized operations from `operations/` module
- Improved help text and error messages
- **Logging behavior**: Log level now defaults to disabled unless `--log-level` is explicitly specified
- **Code organization**:
- Refactored CLI to support both interactive REPL and non-interactive CLI modes
- Eliminated code duplication between CLI and REPL by introducing shared operations
- `SQLGenerator.generate_batch()` now accepts optional DDL parameters: `create_schema`, `database_name`, `schema_name`
- `AppConfig` dataclass includes new `create_schema: bool = False` field

### Fixed
- Removed `IF NOT EXISTS` from `CREATE DATABASE` statement (PostgreSQL doesn't support it)

### Technical Details
- **New modules**:
- `pgslice.dumper.ddl_generator.DDLGenerator`: DDL generation for schema dumps
- `pgslice.dumper.dump_service.DumpService`: Centralized dump service
- `pgslice.operations`: Package with shared CLI/REPL operations
- `pgslice.utils.graph_visualizer`: Dependency graph visualization
- `pgslice.utils.spinner`: Spinner animation for progress indication
- **Test coverage**: 8 new test modules added, maintained >93% overall code coverage
- **Dependency management**: Uses Kahn's algorithm for table dependency ordering in DDL generation
- **Architecture**: Cleaner separation between CLI routing, REPL mode, and shared operations
48 changes: 1 addition & 47 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -54,41 +54,6 @@ clean: ## Remove build artifacts and cache
show-version: ## Show current version from pyproject.toml
@uv version

bump-patch: ## Bump patch version (0.1.1 -> 0.1.2)
@uv version --bump patch

bump-minor: ## Bump minor version (0.1.1 -> 0.2.0)
@uv version --bump minor

bump-major: ## Bump major version (0.1.1 -> 1.0.0)
@uv version --bump major

# Python package building and publishing
build-dist: clean ## Build Python distribution packages (wheel + sdist)
@echo "Building distribution packages..."
uv build
@echo "Build complete! Packages in dist/"
@ls -lh dist/

install-local: build-dist ## Install package locally from built wheel
@echo "Installing from local build..."
uv pip install dist/*.whl --force-reinstall
@echo "Installation complete! Test with: pgslice --version"

publish-test: build-dist ## Publish to TestPyPI for testing
@echo "Publishing to TestPyPI..."
uv publish --publish-url https://test.pypi.org/legacy/
@echo "Published to TestPyPI! Install with:"
@echo " pip install --index-url https://test.pypi.org/simple/ pgslice"

publish: all-checks build-dist ## Publish to production PyPI (requires confirmation)
@echo "WARNING: This will publish to production PyPI!"
@read -p "Version $$(grep '^version = ' pyproject.toml | cut -d'"' -f2) - Continue? [y/N] " confirm && \
[ "$$confirm" = "y" ] || [ "$$confirm" = "Y" ] || (echo "Aborted." && exit 1)
@echo "Publishing to PyPI..."
uv publish
@echo "Published! Install with: pip install pgslice"

# Docker commands
docker-build: ## Build Docker image
docker build -t $(DOCKER_IMAGE) .
Expand Down Expand Up @@ -123,20 +88,9 @@ uv-install: ## Install uv (one-time setup)
sync: ## Sync dependencies with uv (local development)
uv sync --all-extras

lock: ## Update uv.lock file
uv lock

test-compat: ## Test compatibility across Python versions
@echo "Testing Python 3.10..."
@uv run --python 3.10 python --version || echo "Python 3.10 not available"
@echo "Testing Python 3.13..."
@uv run --python 3.13 python --version || echo "Python 3.13 not available"
@echo "Testing Python 3.14..."
@uv run --python 3.14 python --version || echo "Python 3.14 not available"

setup: ## One-time local development setup
@echo "Copying env file..."
cp .env.template .env
cp .env.example .env
@echo "Setting up local development environment..."
@command -v uv >/dev/null 2>&1 || (echo "Installing uv..." && curl -LsSf https://astral.sh/uv/install.sh | sh)
@echo "Installing Python 3.14..."
Expand Down
148 changes: 123 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,14 @@ Extract only what you need while maintaining referential integrity.

## Features

- ✅ **CLI-first design**: Stream SQL to stdout for easy piping and scripting
- ✅ **Bidirectional FK traversal**: Follows relationships in both directions (forward and reverse)
- ✅ **Circular relationship handling**: Prevents infinite loops with visited tracking
- ✅ **Multiple records**: Extract multiple records in one operation
- ✅ **Timeframe filtering**: Filter specific tables by date ranges
- ✅ **PK remapping**: Auto-remaps auto-generated primary keys for clean imports
- ✅ **Interactive REPL**: User-friendly command-line interface
- ✅ **DDL generation**: Optionally include CREATE DATABASE/SCHEMA/TABLE statements for self-contained dumps
- ✅ **Progress bar**: Visual progress indicator for dump operations
- ✅ **Schema caching**: SQLite-based caching for improved performance
- ✅ **Type-safe**: Full type hints with mypy strict mode
- ✅ **Secure**: SQL injection prevention, secure password handling
Expand Down Expand Up @@ -83,40 +85,136 @@ See [DEVELOPMENT.md](DEVELOPMENT.md) for detailed development setup instructions

## Quick Start

### CLI Mode

The CLI mode streams SQL to stdout by default, making it easy to pipe or redirect output:

```bash
# In REPL:
# This will dump all related records to the film with id 1
# The generated SQL file will be placed, by default, in ~/.pgslice/dumps
# The name will be a formated string with table name, id, and timestamp
pgslice> dump "film" 1
# Basic dump to stdout (pipe to file)
PGPASSWORD=xxx pgslice --host localhost --database mydb --table users --pks 42 > user_42.sql

# You can overwrite the output path with:
pgslice> dump "film" 1 --output film_1.sql
# Multiple records
PGPASSWORD=xxx pgslice --host localhost --database mydb --table users --pks 1,2,3 > users.sql

# Extract multiple records
pgslice> dump "actor" 1,2,3 --output multiple_actors.sql
# Output directly to file with --output flag
pgslice --host localhost --database mydb --table users --pks 42 --output user_42.sql

# Use wide mode to follow all relationships (including self-referencing FKs)
# Be cautions that this can result in larger datasets. So use with caution
pgslice> dump "customer" 42 --wide --output customer_42.sql
# Dump by timeframe (instead of PKs) - filters main table by date range
pgslice --host localhost --database mydb --table orders \
--timeframe "created_at:2024-01-01:2024-12-31" > orders_2024.sql

# Apply timeframe filter
pgslice> dump "customer" 42 --timeframe "rental:rental_date:2024-01-01:2024-12-31"
# Wide mode: follow all relationships including self-referencing FKs
# Be cautious - this can result in larger datasets
pgslice --host localhost --database mydb --table customer --pks 42 --wide > customer.sql

# List all tables
pgslice> tables
# Keep original primary keys (no remapping)
pgslice --host localhost --database mydb --table film --pks 1 --keep-pks > film.sql

# Generate self-contained SQL with DDL statements
# Includes CREATE DATABASE/SCHEMA/TABLE statements
pgslice --host localhost --database mydb --table film --pks 1 --create-schema > film_complete.sql

# Apply truncate filter to limit related tables by date range
pgslice --host localhost --database mydb --table customer --pks 42 \
--truncate "rental:rental_date:2024-01-01:2024-12-31" > customer.sql

# Enable debug logging (writes to stderr)
pgslice --host localhost --database mydb --table users --pks 42 \
--log-level DEBUG 2>debug.log > output.sql
```

### Schema Exploration

```bash
# List all tables in the schema
pgslice --host localhost --database mydb --tables

# Describe table structure and relationships
pgslice --host localhost --database mydb --describe users
```

### SSH Remote Execution

Run pgslice on a remote server and capture output locally:

```bash
# Execute on remote server, save output locally
ssh remote.server.com "PGPASSWORD=xxx pgslice --host db.internal --database mydb \
--table users --pks 1 --create-schema" > local_dump.sql

# With SSH tunnel for database access
ssh -f -N -L 5433:db.internal:5432 bastion.example.com
PGPASSWORD=xxx pgslice --host localhost --port 5433 --database mydb \
--table users --pks 42 > user.sql
```

# Show table structure and relationships
### Interactive REPL

```bash
# Start interactive REPL
pgslice --host localhost --database mydb

pgslice> dump "film" 1 --output film_1.sql
pgslice> tables
pgslice> describe "film"
```

## CLI vs REPL: Output Behavior

Understanding the difference between CLI and REPL modes:

### CLI Mode (stdout by default)
The CLI streams SQL to **stdout** by default, perfect for piping and scripting:

# Keep original primary key values (no remapping)
# By default, we will dinamically assign ids to the new generated records
# and handle conflicts gracefully. Meaninh, you can run the same file multiple times
# and no conflicts will arise.
# If you want to keep the original id's run:
pgslice> dump "film" 1 --keep-pks --output film_1.sql
```bash
# Streams to stdout - redirect with >
pgslice --table users --pks 42 > user_42.sql

# Or use --output flag
pgslice --table users --pks 42 --output user_42.sql

# Pipe to other commands
pgslice --table users --pks 42 | gzip > user_42.sql.gz
```

### REPL Mode (files by default)
The REPL writes to **`~/.pgslice/dumps/`** by default when `--output` is not specified:

```bash
# In REPL: writes to ~/.pgslice/dumps/public_users_42.sql
pgslice> dump "users" 42

# Specify custom output path
pgslice> dump "users" 42 --output /path/to/user.sql
```

### Same Operations, Different Modes

| Operation | CLI | REPL |
|-----------|-----|------|
| **List tables** | `pgslice --tables` | `pgslice> tables` |
| **Describe table** | `pgslice --describe users` | `pgslice> describe "users"` |
| **Dump to stdout** | `pgslice --table users --pks 42` | N/A (REPL always writes to file) |
| **Dump to file** | `pgslice --table users --pks 42 --output user.sql` | `pgslice> dump "users" 42 --output user.sql` |
| **Dump (default)** | Stdout | `~/.pgslice/dumps/public_users_42.sql` |
| **Multiple PKs** | `pgslice --table users --pks 1,2,3` | `pgslice> dump "users" 1,2,3` |
| **Truncate filter** | `pgslice --table users --pks 42 --truncate "orders:2024-01-01:2024-12-31"` | `pgslice> dump "users" 42 --truncate "orders:2024-01-01:2024-12-31"` |
| **Wide mode** | `pgslice --table users --pks 42 --wide` | `pgslice> dump "users" 42 --wide` |

### When to Use Each Mode

**Use CLI mode when:**
- Piping output to other commands
- Scripting and automation
- Remote execution via SSH
- One-off dumps

**Use REPL mode when:**
- Exploring database schema interactively
- Running multiple dumps in a session
- You prefer persistent file output
- Testing different dump configurations

## Configuration

Key environment variables (see `.env.example` for full reference):
Expand All @@ -131,7 +229,7 @@ Key environment variables (see `.env.example` for full reference):
| `PGPASSWORD` | Database password (env var only) | - |
| `CACHE_ENABLED` | Enable schema caching | `true` |
| `CACHE_TTL_HOURS` | Cache time-to-live | `24` |
| `LOG_LEVEL` | Logging level | `INFO` |
| `LOG_LEVEL` | Logging level (disabled by default unless specified) | disabled |
| `PGSLICE_OUTPUT_DIR` | Output directory | `~/.pgslice/dumps` |

## Security
Expand Down
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ dependencies = [
"printy==3.0.0",
"tabulate>=0.9.0",
"python-dotenv>=1.0.0",
"tqdm>=4.66.0",
]

[project.optional-dependencies]
Expand Down Expand Up @@ -104,6 +105,10 @@ ignore_missing_imports = true
module = "tabulate.*"
ignore_missing_imports = true

[[tool.mypy.overrides]]
module = "tqdm.*"
ignore_missing_imports = true

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = "test_*.py"
Expand Down
Loading