Skip to content

Dockerize and document a reproducible local setup (load halo-data.dump and run training queries) #5

@jlwaldeck

Description

@jlwaldeck

Summary
Make it one-command easy for newcomers to spin up Postgres, load the provided halo-data.dump, and run the training queries in self-join-training/ and window-function-training/. This will reduce onboarding friction and ensure everyone runs against the same schema/data.

Why

  • The repo includes a Postgres dump (halo-data.dump) and schema SQL (base-data-model/tables), but there are no clear, end-to-end setup instructions or automation to restore and verify the dataset.
  • Having a reproducible environment will help students focus on SQL concepts rather than local setup.

Proposed changes

  1. Add Dockerized Postgres

    • Add docker-compose.yml that starts a Postgres service (e.g., postgres:16) with a named volume.
    • Expose port 5432; set POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD via .env.
    • Optionally add a simple healthcheck.
  2. Bootstrap database with schema + data

    • Add scripts/init/01_schema.sql that stitches together base-data-model/tables/*.sql (or a single combined schema.sql) to create all tables.
    • Add scripts/init/02_restore.sh that loads halo-data.dump using pg_restore/psql after the DB is up. If halo-data.dump already includes schema, document which step comes first and whether schema-only or data-only restore is needed.
    • Provide a Makefile (or package.json scripts) with: make up, make down, make reset, make psql, make restore.
  3. Update README with Quickstart

    • Prereqs: Docker, docker-compose, optional psql.
    • One-liners:
      • cp .env.example .env
      • docker compose up -d
      • make restore (or ./scripts/init/02_restore.sh)
    • How to connect:
      • psql "postgresql://$POSTGRES_USER:$POSTGRES_PASSWORD@localhost:5432/$POSTGRES_DB"
    • How to run training queries:
      • psql -f self-join-training/self-join-analysis-query.sql
      • psql -f window-function-training/hot_and_cold_ratios.sql
      • psql -f window-function-training/win_and_lose_streaks.sql
      • psql -f window-function-training/rank_filter_streaks.sql
    • Link to database_model_diagram.jpg and briefly explain the main tables.
  4. Add verification checks

    • scripts/verify.sql to sanity-check row counts per table (players, matches, match_details, medals, medals_matches_players, game_variants, maps, etc.).
    • Document expected counts or at least shape (non-zero rows) so users know the restore worked.
  5. Optional: lightweight sample dataset

    • If halo-data.dump is large, include a small sample dump (or INSERT-based seed) to keep the Quickstart fast, and mention the full dump as an optional download step.

Acceptance criteria

  • docker compose up -d followed by a single command (make restore or ./scripts/init/02_restore.sh) creates a working database with the schema and data loaded.
  • README has a Quickstart section with copy/paste commands and troubleshooting notes.
  • Running each SQL file in self-join-training/ and window-function-training/ succeeds without modification.
  • A verify script confirms that key tables are populated.

Notes

  • Please specify whether the target is strictly PostgreSQL (assumed from .dump) and any version-specific features used.
  • If indexes or constraints are needed for performance with the training queries, document or add them in the schema step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions