Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ Thumbs.db
.pgslice_history
schema_cache.db
output/
dumps/

# AI
CLAUDE.md
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.2.1] - 2025-12-29

### Fixed
- Docker volume permission issues with dedicated entrypoint script

### Changed
- Optimized graph traversal performance using batch queries for relationship lookups

## [0.2.0] - 2025-12-28

### Added
Expand Down
96 changes: 95 additions & 1 deletion DOCKER_USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,44 @@ docker run --rm -it \
edraobdu/pgslice:latest \
pgslice --host your.db.host --port 5432 --user your_user --database your_db
```

### Connecting to Localhost Database

When your PostgreSQL database is running on your host machine (localhost), the container cannot access it using `localhost` or `127.0.0.1` because these refer to the container itself, not your host.

**Solution 1: Use host networking (Linux, simplest)**
```bash
docker run --rm -it \
--network host \
-v $(pwd)/dumps:/home/pgslice/.pgslice/dumps \
-e PGPASSWORD=your_password \
edraobdu/pgslice:latest \
pgslice --host localhost --port 5432 --user your_user --database your_db
```

**Solution 2: Use host.docker.internal (Mac/Windows)**
```bash
docker run --rm -it \
-v $(pwd)/dumps:/home/pgslice/.pgslice/dumps \
-e PGPASSWORD=your_password \
edraobdu/pgslice:latest \
pgslice --host host.docker.internal --port 5432 --user your_user --database your_db
```

**Solution 3: Use Docker bridge IP (Linux alternative)**
```bash
# Find your host's Docker bridge IP (usually 172.17.0.1)
docker run --rm -it \
-v $(pwd)/dumps:/home/pgslice/.pgslice/dumps \
-e PGPASSWORD=your_password \
edraobdu/pgslice:latest \
pgslice --host 172.17.0.1 --port 5432 --user your_user --database your_db
```

**Note:** Make sure your PostgreSQL is configured to accept connections from Docker containers:
- Edit `postgresql.conf`: Set `listen_addresses = '*'` or `listen_addresses = '0.0.0.0'`
- Edit `pg_hba.conf`: Add entry like `host all all 172.17.0.0/16 md5` (for Docker bridge network)

### Using Environment File

Create a `.env` file:
Expand Down Expand Up @@ -86,7 +124,63 @@ Mount a local directory to persist SQL dumps:
-v $(pwd)/dumps:/home/pgslice/.pgslice/dumps
```

**Important:** The dumps directory is created inside the container with non-root user permissions (UID 1000).
#### Volume Permissions

The container runs as non-root user `pgslice` (UID 1000) for security. When mounting local directories:

**The entrypoint script automatically handles permissions** by:
1. Detecting mounted volumes
2. Fixing ownership to UID 1000 if needed
3. Providing helpful error messages if permissions can't be fixed

**If you encounter permission errors:**

**Option 1: Pre-fix permissions on host (recommended)**
```bash
# Create dumps directory and set ownership
mkdir -p dumps
sudo chown -R 1000:1000 dumps

# Run container
docker run --rm -it \
-v $(pwd)/dumps:/home/pgslice/.pgslice/dumps \
edraobdu/pgslice:latest pgslice
```

**Option 2: Run as your user ID**
```bash
# Run container as your user (bypasses UID 1000)
docker run --rm -it \
-v $(pwd)/dumps:/home/pgslice/.pgslice/dumps \
--user $(id -u):$(id -g) \
edraobdu/pgslice:latest pgslice
```

**Why UID 1000?**
- Common default UID for first user on Linux systems
- Matches most developer workstations
- If your user is different, use `--user $(id -u):$(id -g)` flag

### Remote Server Workflow

When running pgslice on a remote server, dumps are created as files with visible progress:

```bash
# SSH into remote server and run dump
ssh user@remote-server "docker run --rm \
-v /tmp/dumps:/home/pgslice/.pgslice/dumps \
--env-file .env \
edraobdu/pgslice:latest \
pgslice --dump users --pks 42"

# Copy the generated file to your local machine
scp user@remote-server:/tmp/dumps/public_users_42_*.sql ./local_dumps/

# Or use rsync for better performance with large files
rsync -avz user@remote-server:/tmp/dumps/public_users_42_*.sql ./local_dumps/
```

Progress bars are visible during the dump, and the file is ready to transfer when complete.

## Links

Expand Down
16 changes: 9 additions & 7 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ FROM python:3.13-alpine
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv

# Install system dependencies
RUN apk add --no-cache postgresql-client
RUN apk add --no-cache postgresql-client su-exec

# Install the project into `/app`
WORKDIR /app
Expand All @@ -31,6 +31,9 @@ COPY . /app
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --no-deps -e .

# Copy entrypoint script (must be done as root before USER directive)
COPY --chmod=755 docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh

# Set environment variables
ENV PYTHONUNBUFFERED=1

Expand All @@ -39,14 +42,13 @@ RUN adduser -D -u 1000 pgslice && \
mkdir -p /home/pgslice/.cache/pgslice /home/pgslice/.pgslice/dumps && \
chown -R pgslice:pgslice /app /home/pgslice

# Switch to non-root user
USER pgslice

# Update cache directory to use pgslice's home
ENV PGSLICE_CACHE_DIR=/home/pgslice/.cache/pgslice

# Reset the entrypoint, don't invoke `uv`
ENTRYPOINT []
# Note: Container runs as root, entrypoint will drop to pgslice user after fixing permissions

# Use custom entrypoint to fix permissions
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]

# Default command
# Default command (passed to entrypoint)
CMD ["pgslice", "--help"]
130 changes: 94 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Extract only what you need while maintaining referential integrity.

## Features

- ✅ **CLI-first design**: Stream SQL to stdout for easy piping and scripting
- ✅ **CLI-first design**: Dumps always saved to files with visible progress (matches REPL behavior)
- ✅ **Bidirectional FK traversal**: Follows relationships in both directions (forward and reverse)
- ✅ **Circular relationship handling**: Prevents infinite loops with visited tracking
- ✅ **Multiple records**: Extract multiple records in one operation
Expand Down Expand Up @@ -79,6 +79,65 @@ docker pull edraobdu/pgslice:0.1.1
docker pull --platform linux/amd64 edraobdu/pgslice:latest
```

#### Connecting to Localhost Database

When your PostgreSQL database runs on your host machine, use `--network host` (Linux) or `host.docker.internal` (Mac/Windows):

```bash
# Linux: Use host networking
docker run --rm -it \
--network host \
-v $(pwd)/dumps:/home/pgslice/.pgslice/dumps \
-e PGPASSWORD=your_password \
edraobdu/pgslice:latest \
pgslice --host localhost --database your_db --dump users --pks 42

# Mac/Windows: Use special hostname
docker run --rm -it \
-v $(pwd)/dumps:/home/pgslice/.pgslice/dumps \
-e PGPASSWORD=your_password \
edraobdu/pgslice:latest \
pgslice --host host.docker.internal --database your_db --dump users --pks 42
```

See [DOCKER_USAGE.md](DOCKER_USAGE.md#connecting-to-localhost-database) for more connection options.

#### Docker Volume Permissions

The pgslice container runs as user `pgslice` (UID 1000) for security. When mounting local directories as volumes, you may encounter permission issues.

**The entrypoint script automatically fixes permissions** on mounted volumes. However, if you still encounter issues:

```bash
# Fix permissions on host before mounting
sudo chown -R 1000:1000 ./dumps

# Then run normally
docker run --rm -it \
-v $(pwd)/dumps:/home/pgslice/.pgslice/dumps \
edraobdu/pgslice:latest \
pgslice --host your.db.host --database your_db --dump users --pks 42
```

**Alternative:** Run container as your user:
```bash
docker run --rm -it \
-v $(pwd)/dumps:/home/pgslice/.pgslice/dumps \
--user $(id -u):$(id -g) \
edraobdu/pgslice:latest \
pgslice --host your.db.host --database your_db --dump users --pks 42
```

**For remote servers:**
```bash
# Run dump on remote server
ssh user@remote-server "docker run --rm -v /tmp/dumps:/home/pgslice/.pgslice/dumps \
edraobdu/pgslice:latest pgslice --dump users --pks 42"

# Copy file locally
scp user@remote-server:/tmp/dumps/users_42_*.sql ./
```

### From Source (Development)

See [DEVELOPMENT.md](DEVELOPMENT.md) for detailed development setup instructions.
Expand All @@ -87,40 +146,40 @@ See [DEVELOPMENT.md](DEVELOPMENT.md) for detailed development setup instructions

### CLI Mode

The CLI mode streams SQL to stdout by default, making it easy to pipe or redirect output:
Dumps are always saved to files with visible progress indicators (helpful for large datasets):

```bash
# Basic dump to stdout (pipe to file)
PGPASSWORD=xxx pgslice --host localhost --database mydb --table users --pks 42 > user_42.sql
# Basic dump (auto-generates filename like: public_users_42_TIMESTAMP.sql)
PGPASSWORD=xxx pgslice --host localhost --database mydb --dump users --pks 42

# Multiple records
PGPASSWORD=xxx pgslice --host localhost --database mydb --table users --pks 1,2,3 > users.sql
PGPASSWORD=xxx pgslice --host localhost --database mydb --dump users --pks 1,2,3

# Output directly to file with --output flag
pgslice --host localhost --database mydb --table users --pks 42 --output user_42.sql
# Specify output file path
pgslice --host localhost --database mydb --dump users --pks 42 --output user_42.sql

# Dump by timeframe (instead of PKs) - filters main table by date range
pgslice --host localhost --database mydb --table orders \
--timeframe "created_at:2024-01-01:2024-12-31" > orders_2024.sql
pgslice --host localhost --database mydb --dump orders \
--timeframe "created_at:2024-01-01:2024-12-31" --output orders_2024.sql

# Wide mode: follow all relationships including self-referencing FKs
# Be cautious - this can result in larger datasets
pgslice --host localhost --database mydb --table customer --pks 42 --wide > customer.sql
pgslice --host localhost --database mydb --dump customer --pks 42 --wide

# Keep original primary keys (no remapping)
pgslice --host localhost --database mydb --table film --pks 1 --keep-pks > film.sql
pgslice --host localhost --database mydb --dump film --pks 1 --keep-pks

# Generate self-contained SQL with DDL statements
# Includes CREATE DATABASE/SCHEMA/TABLE statements
pgslice --host localhost --database mydb --table film --pks 1 --create-schema > film_complete.sql
pgslice --host localhost --database mydb --dump film --pks 1 --create-schema

# Apply truncate filter to limit related tables by date range
pgslice --host localhost --database mydb --table customer --pks 42 \
--truncate "rental:rental_date:2024-01-01:2024-12-31" > customer.sql
pgslice --host localhost --database mydb --dump customer --pks 42 \
--truncate "rental:rental_date:2024-01-01:2024-12-31"

# Enable debug logging (writes to stderr)
pgslice --host localhost --database mydb --table users --pks 42 \
--log-level DEBUG 2>debug.log > output.sql
pgslice --host localhost --database mydb --dump users --pks 42 \
--log-level DEBUG 2>debug.log
```

### Schema Exploration
Expand All @@ -140,12 +199,12 @@ Run pgslice on a remote server and capture output locally:
```bash
# Execute on remote server, save output locally
ssh remote.server.com "PGPASSWORD=xxx pgslice --host db.internal --database mydb \
--table users --pks 1 --create-schema" > local_dump.sql
--dump users --pks 1 --create-schema" > local_dump.sql

# With SSH tunnel for database access
ssh -f -N -L 5433:db.internal:5432 bastion.example.com
PGPASSWORD=xxx pgslice --host localhost --port 5433 --database mydb \
--table users --pks 42 > user.sql
--dump users --pks 42 > user.sql
```

### Interactive REPL
Expand All @@ -163,43 +222,42 @@ pgslice> describe "film"

Understanding the difference between CLI and REPL modes:

### CLI Mode (stdout by default)
The CLI streams SQL to **stdout** by default, perfect for piping and scripting:
### CLI Mode (files with progress)
The CLI writes to files and shows progress bars (helpful for large datasets):

```bash
# Streams to stdout - redirect with >
pgslice --table users --pks 42 > user_42.sql
# Writes to ~/.pgslice/dumps/public_users_42_TIMESTAMP.sql
pgslice --dump users --pks 42

# Or use --output flag
pgslice --table users --pks 42 --output user_42.sql

# Pipe to other commands
pgslice --table users --pks 42 | gzip > user_42.sql.gz
# Specify output file
pgslice --dump users --pks 42 --output user_42.sql
```

### REPL Mode (files by default)
The REPL writes to **`~/.pgslice/dumps/`** by default when `--output` is not specified:
### REPL Mode (same behavior)
The REPL also writes to **`~/.pgslice/dumps/`** by default:

```bash
# In REPL: writes to ~/.pgslice/dumps/public_users_42.sql
# Writes to ~/.pgslice/dumps/public_users_42_TIMESTAMP.sql
pgslice> dump "users" 42

# Specify custom output path
pgslice> dump "users" 42 --output /path/to/user.sql
```

Both modes now behave identically - always writing to files with visible progress.

### Same Operations, Different Modes

| Operation | CLI | REPL |
|-----------|-----|------|
| **List tables** | `pgslice --tables` | `pgslice> tables` |
| **Describe table** | `pgslice --describe users` | `pgslice> describe "users"` |
| **Dump to stdout** | `pgslice --table users --pks 42` | N/A (REPL always writes to file) |
| **Dump to file** | `pgslice --table users --pks 42 --output user.sql` | `pgslice> dump "users" 42 --output user.sql` |
| **Dump (default)** | Stdout | `~/.pgslice/dumps/public_users_42.sql` |
| **Multiple PKs** | `pgslice --table users --pks 1,2,3` | `pgslice> dump "users" 1,2,3` |
| **Truncate filter** | `pgslice --table users --pks 42 --truncate "orders:2024-01-01:2024-12-31"` | `pgslice> dump "users" 42 --truncate "orders:2024-01-01:2024-12-31"` |
| **Wide mode** | `pgslice --table users --pks 42 --wide` | `pgslice> dump "users" 42 --wide` |
| **Dump (auto-named)** | `pgslice --dump users --pks 42` | `pgslice> dump "users" 42` |
| **Dump to file** | `pgslice --dump users --pks 42 --output user.sql` | `pgslice> dump "users" 42 --output user.sql` |
| **Dump (default path)** | `~/.pgslice/dumps/public_users_42_TIMESTAMP.sql` | `~/.pgslice/dumps/public_users_42_TIMESTAMP.sql` |
| **Multiple PKs** | `pgslice --dump users --pks 1,2,3` | `pgslice> dump "users" 1,2,3` |
| **Truncate filter** | `pgslice --dump users --pks 42 --truncate "orders:2024-01-01:2024-12-31"` | `pgslice> dump "users" 42 --truncate "orders:2024-01-01:2024-12-31"` |
| **Wide mode** | `pgslice --dump users --pks 42 --wide` | `pgslice> dump "users" 42 --wide` |

### When to Use Each Mode

Expand Down
Loading