Skip to content

feat: Data seeding support with environment-specific seed files #37

@Lazialize

Description

@Lazialize

Problem

There is no built-in mechanism for inserting initial data (master data, test data, etc.) into the database. Developers must manually execute SQL or prepare external scripts. There is no standardized way to manage environment-specific data (test data for development, master data for production).

Proposed Solution

Add a strata seed command to manage initial data through YAML-based seed files.

Seed File Structure

seeds/
├── common/                 # Shared across all environments
│   ├── 001_roles.yaml
│   └── 002_categories.yaml
├── development/            # Development environment only
│   ├── 001_test_users.yaml
│   └── 002_test_posts.yaml
└── production/             # Production environment only
    └── 001_master_data.yaml

Seed File Format

version: "1.0"
table: users
mode: upsert         # insert | upsert | replace
unique_by: [email]   # Unique key for upsert/replace
data:
  - name: "Admin User"
    email: "admin@example.com"
    role: "admin"
    created_at: "2026-01-01T00:00:00Z"
  - name: "Test User"
    email: "test@example.com"
    role: "user"

CLI Commands

# Run all seeds (common + current environment)
strata seed

# Run a specific file only
strata seed --file seeds/common/001_roles.yaml

# Specify environment
strata seed --env production

# Dry run
strata seed --dry-run

# Reset (TRUNCATE tables before re-seeding)
strata seed --reset

Implementation Plan

  1. Seed model (src/core/src/core/seed.rs — new)

    pub struct SeedFile {
        pub version: String,
        pub table: String,
        pub mode: SeedMode,
        pub unique_by: Option<Vec<String>>,
        pub data: Vec<HashMap<String, serde_json::Value>>,
    }
    
    pub enum SeedMode {
        Insert,   // Simple insert (error on duplicate)
        Upsert,   // Update if exists, insert if not
        Replace,  // Delete existing, then insert
    }
  2. Seed execution service (src/db/src/services/seed_executor.rs — new)

    • Read and validate seed files
    • Resolve dependency ordering (execution order based on FK dependencies)
    • Idempotent execution via checksums (avoid re-running unchanged seeds)
    • Dialect-specific SQL generation (INSERT / INSERT ON CONFLICT / REPLACE)
  3. Idempotent execution tracking

    • Manage applied seeds and checksums in a _strata_seeds table
    • Only re-execute seeds whose checksums have changed
  4. CLI command (src/cli/src/cli/commands/seed.rs — new)

    • Add seed subcommand
    • --env, --dry-run, --reset, --file options
  5. Config extension (src/core/src/core/config.rs)

    seeds_dir: seeds     # Default

Files Affected

  • src/core/src/core/seed.rs (new)
  • src/db/src/services/seed_executor.rs (new)
  • src/db/src/adapters/database_migrator.rs — Create seeds tracking table
  • src/cli/src/cli/commands/seed.rs (new)
  • src/cli/src/cli/cli.rs — Add seed subcommand
  • src/core/src/core/config.rs — Add seeds_dir setting

Alternatives Considered

  • SQL-based seed files: Dialect-dependent; YAML with dialect abstraction provides better consistency
  • CSV import: Limited expressiveness for data types. YAML is more suitable for structured data
  • ORM integration: Strata is a schema management tool; coupling with an ORM is out of scope

Additional Context

  • Corresponds to the entire "Data Seeding" section in ROADMAP.md
  • Idempotent execution via checksum management can reuse the existing migration checksum infrastructure (sha2 crate)
  • FK dependency ordering can reference the MigrationPipeline circular dependency detection logic

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions