Skip to content

Conversation

Copy link

Copilot AI commented Oct 7, 2025

Summary

Removed all files from the default_data/ directory to eliminate potential security vulnerabilities from committed seed data, default configuration values, and test credentials. The directory structure is preserved with a .gitkeep file, and comprehensive documentation has been added to guide developers on secure alternatives for data seeding.

Motivation

The default_data/ directory previously contained 27 JSON and SQL files with seed data including:

  • AI prompt templates and metrics
  • Sample articles, products, and images
  • Email templates and internationalization data
  • Database snapshots and configuration files

These files posed security risks:

  • Potential exposure of sensitive configuration values
  • Risk of accidentally committing production data
  • Default credentials or API keys in test data
  • Security vulnerabilities from stale sample data

Changes Made

1. File Removal (7,337 lines)

Removed all 27 tracked files from default_data/ including:

  • aiprompts.json, ai_metrics.json
  • articles.json, products.json, images.json
  • email_templates.json, internationalisations.json
  • Database snapshots and other seed data files

2. Directory Preservation

Added default_data/.gitkeep with inline documentation explaining:

  • The directory's purpose for import/export commands
  • Policy against committing files to version control
  • Direction to use secure alternatives

3. .gitignore Enhancement

Updated .gitignore to prevent future commits:

# Default Data
# Ignore all files in default_data directory, but keep the directory structure
default_data/*
# Allow .gitkeep to preserve the directory
!default_data/.gitkeep

Verified with test files - new files in default_data/ are properly ignored.

4. Documentation Updates

SECURITY.md - Added "Default Data and Seed Files Policy" section:

  • ⚠️ Warning against committing seed data to default_data/
  • Lists security risks clearly
  • Provides 4 recommended secure alternatives:
    1. Database Migrations - Use CakePHP migrations for structure and seeds
    2. Environment Variables - Store config in .env files (never committed)
    3. Admin UI - Create initial data through the application interface
    4. Runtime Fixtures - Generate test data programmatically
  • Documents proper usage of import/export commands for local development

SETUP_GUIDE.md - Added "Data Seeding" section:

  • Explains why default_data/ is intentionally empty
  • Notes that import warnings during setup are expected and safe
  • Provides practical examples for local development workflows
  • Links to SECURITY.md for complete policy details

Backward Compatibility

No breaking changes - All existing code remains functional:

  • Application startup (run_dev_env.sh): Import commands are wrapped in conditional blocks that show warnings on failure, not errors. The application starts successfully with empty default_data/.

  • Import/Export commands: DefaultDataImportCommand and DefaultDataExportCommand handle missing files gracefully:

    • Returns warning (success code) when directory is empty
    • Returns error when specific file requested but missing (caught by scripts)
  • Development workflow: Import/export commands remain available for developers who maintain local seed files (which are now properly ignored by git).

  • Tests: No test files reference default_data/ - test suite unaffected.

Migration Guide

For New Developers

  1. Clone the repository - default_data/ will be empty
  2. Run ./run_dev_env.sh
  3. Import warnings are expected and safe - application works fine
  4. Create data via Admin UI or database migrations as needed

For Existing Developers with Local Seed Data

If you have local seed files you want to preserve:

# Before pulling these changes
docker compose exec willowcms bin/cake default_data_export

# Pull the changes - your exported files remain in default_data/ (ignored by git)

# Import when needed
docker compose exec willowcms bin/cake default_data_import

Security Impact

Eliminated:

  • ❌ Exposure of sensitive configuration values in version control
  • ❌ Risk of committing production data
  • ❌ Default credentials accessible in repository history
  • ❌ Security vulnerabilities from outdated test data

Preserved:

  • ✅ Full application functionality
  • ✅ Development workflow (with documented alternatives)
  • ✅ Import/export commands for local use
  • ✅ All existing tests pass

Verification

  • .gitignore prevents new files in default_data/ from being committed
  • .gitkeep is tracked and preserves directory structure
  • Application startup scripts handle missing files gracefully
  • No tests depend on default_data/ files
  • Documentation clearly explains the policy and alternatives

Files changed: 31 | Lines added: 62 | Lines removed: 7,337

Original prompt

Goal: Remove all files from the default_data directory to eliminate potential security exposure of default configuration or secret-like values, and prevent reintroduction.

Context:

Scope of work:

  1. Remove default_data contents from version control

    • Delete all tracked files under default_data/.
    • Keep the directory itself to avoid possible runtime/path assumptions by the app, using a .gitkeep placeholder.
  2. Prevent reintroduction of files under default_data/

    • Add default_data/ to .gitignore so new files aren’t accidentally committed.
    • Ensure .gitignore rules allow keeping .gitkeep if we retain the folder (e.g., ignore default_data/** then negate default_data/.gitkeep).
  3. Documentation update

    • In README or a SECURITY/CONTRIBUTING section, add a short note explaining that default_data must not contain production or secret values and should not be committed. Point readers to use environment variables, database migrations/seeds, or admin UI for seeding data.
  4. Code audit for references

    • Search for references to default_data in the codebase.
    • If any runtime or tests depend on specific files being present, update code to handle the folder being empty and/or adjust tests to use fixtures or factories instead of repo-tracked files.
  5. Validation

    • Run the application (or applicable linters/tests) to ensure removing contents doesn’t break startup or critical flows.

Deliverables in the PR:

  • Removal of all files under default_data/.
  • Addition of .gitkeep under default_data/ to preserve the directory path.
  • Update to .gitignore to ignore default_data contents but keep .gitkeep.
  • Documentation update (README.md or docs) noting the policy and guidance.

Notes:

  • Do not remove or modify unrelated files.
  • If any tests or scripts rely on specific default_data files, update them to use ephemeral test fixtures created at runtime.
*This pull request was created as a result of the following prompt from Copilot chat.* > Goal: Remove all files from the default_data directory to eliminate potential security exposure of default configuration or secret-like values, and prevent reintroduction. > > Context: > - Repository: garzarobm/willow > - Directory to clean: default_data (see: https://github.com/garzarobm/willow/tree/main/default_data) > - Request from maintainer: “I need to remove all files from default_data to ensure no security vulnerabilities exist.” > > Scope of work: > 1) Remove default_data contents from version control > - Delete all tracked files under default_data/. > - Keep the directory itself to avoid possible runtime/path assumptions by the app, using a .gitkeep placeholder. > > 2) Prevent reintroduction of files under default_data/ > - Add default_data/ to .gitignore so new files aren’t accidentally committed. > - Ensure .gitignore rules allow keeping .gitkeep if we retain the folder (e.g., ignore default_data/** then negate default_data/.gitkeep). > > 3) Documentation update > - In README or a SECURITY/CONTRIBUTING section, add a short note explaining that default_data must not contain production or secret values and should not be committed. Point readers to use environment variables, database migrations/seeds, or admin UI for seeding data. > > 4) Code audit for references > - Search for references to default_data in the codebase. > - If any runtime or tests depend on specific files being present, update code to handle the folder being empty and/or adjust tests to use fixtures or factories instead of repo-tracked files. > > 5) Validation > - Run the application (or applicable linters/tests) to ensure removing contents doesn’t break startup or critical flows. > > Deliverables in the PR: > - Removal of all files under default_data/. > - Addition of .gitkeep under default_data/ to preserve the directory path. > - Update to .gitignore to ignore default_data contents but keep .gitkeep. > - Documentation update (README.md or docs) noting the policy and guidance. > > Notes: > - Do not remove or modify unrelated files. > - If any tests or scripts rely on specific default_data files, update them to use ephemeral test fixtures created at runtime. >

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: garzarobm <20546156+garzarobm@users.noreply.github.com>
Copilot AI changed the title [WIP] Remove all files from default_data directory for security Remove default_data files to prevent security exposure of configuration and secrets Oct 7, 2025
Copilot AI requested a review from garzarobm October 7, 2025 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants