diff --git a/data/backup/knowledge_backup.db b/data/backup/knowledge_backup.db deleted file mode 100644 index 5b9c3ff..0000000 Binary files a/data/backup/knowledge_backup.db and /dev/null differ diff --git a/data/backup/knowledge_detailed_description_2025-04-21T00-33-06-807Z.db b/data/backup/knowledge_detailed_description_2025-04-21T00-33-06-807Z.db deleted file mode 100644 index 8385d52..0000000 Binary files a/data/backup/knowledge_detailed_description_2025-04-21T00-33-06-807Z.db and /dev/null differ diff --git a/docs/security/GIT_HISTORY_CLEANUP.md b/docs/security/GIT_HISTORY_CLEANUP.md new file mode 100644 index 0000000..e446c05 --- /dev/null +++ b/docs/security/GIT_HISTORY_CLEANUP.md @@ -0,0 +1,113 @@ +# Security: Git History Cleanup Guide + +## Problem + +Several files containing sensitive data were committed to the repository before `.gitignore` rules were added: + +| File | Issue | Status | +|------|-------|--------| +| `.env` | Contains `JWT_SECRET`, `SESSION_SECRET`, admin credentials, OAuth secrets | Removed from tracking, **still in git history** | +| `knowledge.db` | SQLite database at repo root | Removed from tracking in this PR | +| `data/backup/knowledge_backup.db` | 7.2 MB database backup | Removed from tracking in this PR | +| `data/backup/knowledge_detailed_description_*.db` | 14 MB database backup | Removed from tracking in this PR | + +Even though these files are now in `.gitignore` and untracked, **anyone with repo access can still retrieve the secrets from git history** using commands like: + +```bash +git log --all --oneline -- .env +git show :.env +``` + +## Solution + +### Step 1: Clean Git History (repo owner action required) + +Git history rewriting can only be done by the repo owner. We've provided an automated script: + +```bash +# 1. Make a FRESH clone (required by git-filter-repo) +git clone https://github.com/iamtouchskyer/math-project-server.git math-project-server-clean +cd math-project-server-clean + +# 2. Install git-filter-repo +pip install git-filter-repo + +# 3. Run the cleanup script +bash scripts/clean-git-history.sh + +# 4. Force-push +git push origin --force --all +git push origin --force --tags +``` + +### Step 2: Rotate All Exposed Credentials + +**This is critical** — even after cleaning git history, the secrets may have been cached by GitHub, cloned by others, or indexed by bots. Rotate everything: + +- [ ] `JWT_SECRET` — Generate a new random 256-bit key +- [ ] `SESSION_SECRET` — Generate a new random 256-bit key +- [ ] `ADMIN_USERNAME` / `ADMIN_PASSWORD` — Change admin credentials +- [ ] `GOOGLE_CLIENT_ID` / `GOOGLE_CLIENT_SECRET` — Rotate in Google Cloud Console +- [ ] `STRIPE_SECRET_KEY` / `STRIPE_PUBLISHABLE_KEY` — Roll keys in Stripe Dashboard +- [ ] `APPLE_CLIENT_ID` / `APPLE_TEAM_ID` / `APPLE_KEY_ID` — Update in Apple Developer +- [ ] `WECHAT_APP_ID` / `WECHAT_APP_SECRET` — Rotate in WeChat Developer Platform + +### Step 3: Notify Collaborators + +After force-pushing, all existing clones will be out of sync. Collaborators need to: + +```bash +# Option A: Re-clone (recommended) +rm -rf math-project-server +git clone https://github.com/iamtouchskyer/math-project-server.git + +# Option B: Reset existing clone +cd math-project-server +git fetch origin +git reset --hard origin/main +``` + +### Step 4: Verify + +```bash +# These should all return no results: +git log --all --full-history -- .env +git log --all --full-history -- .env.development +git log --all --full-history -- knowledge.db +git log --all --full-history -- data/backup/knowledge_backup.db +``` + +## Prevention + +The following safeguards are now in place: + +1. **`.gitignore`** — Blocks `.env*`, `*.db`, `*.sqlite`, `*.sqlite3` files +2. **`.env.example`** — Provides a template with placeholder values +3. **This guide** — Documents the cleanup process + +### Recommended: Add a Pre-commit Hook + +Add this to `.git/hooks/pre-commit` to prevent accidental commits of sensitive files: + +```bash +#!/bin/bash +# Prevent committing sensitive files +SENSITIVE_PATTERNS=("\.env$" "\.env\." "\.db$" "\.sqlite" "secret" "password") +STAGED_FILES=$(git diff --cached --name-only) + +for file in $STAGED_FILES; do + for pattern in "${SENSITIVE_PATTERNS[@]}"; do + if echo "$file" | grep -qiE "$pattern"; then + echo "❌ Blocked: '$file' matches sensitive pattern '$pattern'" + echo " If intentional, use: git commit --no-verify" + exit 1 + fi + done +done +``` + +## References + +- [git-filter-repo documentation](https://github.com/newren/git-filter-repo) +- [GitHub: Removing sensitive data](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository) +- [BFG Repo-Cleaner](https://rtyley.github.io/bfg-repo-cleaner/) (alternative tool) diff --git a/knowledge.db b/knowledge.db deleted file mode 100644 index 7d253e9..0000000 Binary files a/knowledge.db and /dev/null differ diff --git a/scripts/clean-git-history.sh b/scripts/clean-git-history.sh new file mode 100755 index 0000000..d417ff5 --- /dev/null +++ b/scripts/clean-git-history.sh @@ -0,0 +1,126 @@ +#!/usr/bin/env bash +# ============================================================================= +# clean-git-history.sh — Remove sensitive files from git history +# ============================================================================= +# +# This script uses git-filter-repo to permanently remove files that were +# accidentally committed to git history, including: +# - .env files (containing secrets like JWT_SECRET, SESSION_SECRET, etc.) +# - .db files (SQLite database backups) +# +# ⚠️ IMPORTANT: This rewrites git history. All collaborators must re-clone +# the repository after this is run. +# +# Prerequisites: +# pip install git-filter-repo +# +# Usage: +# 1. Make a FRESH clone of the repo (required by git-filter-repo) +# git clone https://github.com/iamtouchskyer/math-project-server.git math-project-server-clean +# cd math-project-server-clean +# +# 2. Run this script: +# bash scripts/clean-git-history.sh +# +# 3. Verify the cleanup worked: +# git log --all --full-history -- .env # should return nothing +# git log --all --full-history -- knowledge.db # should return nothing +# +# 4. Force-push to GitHub: +# git push origin --force --all +# git push origin --force --tags +# +# 5. ROTATE ALL CREDENTIALS that were exposed: +# - JWT_SECRET +# - SESSION_SECRET +# - Google OAuth credentials +# - Stripe API keys +# - Apple Sign-In configuration +# - WeChat OAuth secrets +# - Admin credentials +# +# ============================================================================= + +set -euo pipefail + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +NC='\033[0m' # No Color + +echo -e "${YELLOW}╔══════════════════════════════════════════════════════╗${NC}" +echo -e "${YELLOW}║ Git History Cleanup — Removing Sensitive Files ║${NC}" +echo -e "${YELLOW}╚══════════════════════════════════════════════════════╝${NC}" +echo "" + +# Check prerequisites +if ! command -v git-filter-repo &> /dev/null; then + echo -e "${RED}Error: git-filter-repo is not installed.${NC}" + echo "Install it with: pip install git-filter-repo" + exit 1 +fi + +# Verify we're in a git repo +if ! git rev-parse --is-inside-work-tree &> /dev/null; then + echo -e "${RED}Error: Not in a git repository.${NC}" + exit 1 +fi + +# Show what will be removed +echo -e "${YELLOW}Files to remove from history:${NC}" +echo "" + +FILES_TO_REMOVE=( + ".env" + ".env.development" + ".env.local" + "knowledge.db" + "data/backup/knowledge_backup.db" + "data/backup/knowledge_detailed_description_2025-04-21T00-33-06-807Z.db" +) + +for file in "${FILES_TO_REMOVE[@]}"; do + COMMITS=$(git log --all --oneline -- "$file" 2>/dev/null | wc -l) + if [ "$COMMITS" -gt 0 ]; then + echo -e " ${RED}✗${NC} $file (found in $COMMITS commits)" + else + echo -e " ${GREEN}✓${NC} $file (not found — already clean)" + fi +done + +echo "" +echo -e "${YELLOW}⚠️ This will PERMANENTLY rewrite git history.${NC}" +echo -e "${YELLOW} All collaborators must re-clone after this.${NC}" +echo "" +read -p "Continue? (yes/no): " CONFIRM + +if [ "$CONFIRM" != "yes" ]; then + echo "Aborted." + exit 0 +fi + +echo "" +echo -e "${GREEN}Running git-filter-repo...${NC}" + +# Build the filter-repo arguments +ARGS="" +for file in "${FILES_TO_REMOVE[@]}"; do + ARGS="$ARGS --path $file" +done + +# Run git-filter-repo to remove the files +git filter-repo --invert-paths $ARGS --force + +echo "" +echo -e "${GREEN}╔══════════════════════════════════════════════════════╗${NC}" +echo -e "${GREEN}║ ✅ Cleanup complete! ║${NC}" +echo -e "${GREEN}╚══════════════════════════════════════════════════════╝${NC}" +echo "" +echo -e "Next steps:" +echo -e " 1. Verify: ${YELLOW}git log --all --full-history -- .env${NC}" +echo -e " 2. Re-add remote: ${YELLOW}git remote add origin https://github.com/iamtouchskyer/math-project-server.git${NC}" +echo -e " 3. Force-push: ${YELLOW}git push origin --force --all && git push origin --force --tags${NC}" +echo -e " 4. ${RED}ROTATE ALL EXPOSED CREDENTIALS${NC}" +echo -e " 5. Tell collaborators to re-clone" +echo ""