Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed data/backup/knowledge_backup.db
Binary file not shown.
Binary file not shown.
113 changes: 113 additions & 0 deletions docs/security/GIT_HISTORY_CLEANUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Security: Git History Cleanup Guide

## Problem

Several files containing sensitive data were committed to the repository before `.gitignore` rules were added:

| File | Issue | Status |
|------|-------|--------|
| `.env` | Contains `JWT_SECRET`, `SESSION_SECRET`, admin credentials, OAuth secrets | Removed from tracking, **still in git history** |
| `knowledge.db` | SQLite database at repo root | Removed from tracking in this PR |
| `data/backup/knowledge_backup.db` | 7.2 MB database backup | Removed from tracking in this PR |
| `data/backup/knowledge_detailed_description_*.db` | 14 MB database backup | Removed from tracking in this PR |

Even though these files are now in `.gitignore` and untracked, **anyone with repo access can still retrieve the secrets from git history** using commands like:

```bash
git log --all --oneline -- .env
git show <commit-hash>:.env
```

## Solution

### Step 1: Clean Git History (repo owner action required)

Git history rewriting can only be done by the repo owner. We've provided an automated script:

```bash
# 1. Make a FRESH clone (required by git-filter-repo)
git clone https://github.com/iamtouchskyer/math-project-server.git math-project-server-clean
cd math-project-server-clean

# 2. Install git-filter-repo
pip install git-filter-repo

# 3. Run the cleanup script
bash scripts/clean-git-history.sh

# 4. Force-push
git push origin --force --all
git push origin --force --tags
```

### Step 2: Rotate All Exposed Credentials

**This is critical** — even after cleaning git history, the secrets may have been cached by GitHub, cloned by others, or indexed by bots. Rotate everything:

- [ ] `JWT_SECRET` — Generate a new random 256-bit key
- [ ] `SESSION_SECRET` — Generate a new random 256-bit key
- [ ] `ADMIN_USERNAME` / `ADMIN_PASSWORD` — Change admin credentials
- [ ] `GOOGLE_CLIENT_ID` / `GOOGLE_CLIENT_SECRET` — Rotate in Google Cloud Console
- [ ] `STRIPE_SECRET_KEY` / `STRIPE_PUBLISHABLE_KEY` — Roll keys in Stripe Dashboard
- [ ] `APPLE_CLIENT_ID` / `APPLE_TEAM_ID` / `APPLE_KEY_ID` — Update in Apple Developer
- [ ] `WECHAT_APP_ID` / `WECHAT_APP_SECRET` — Rotate in WeChat Developer Platform

### Step 3: Notify Collaborators

After force-pushing, all existing clones will be out of sync. Collaborators need to:

```bash
# Option A: Re-clone (recommended)
rm -rf math-project-server
git clone https://github.com/iamtouchskyer/math-project-server.git

# Option B: Reset existing clone
cd math-project-server
git fetch origin
git reset --hard origin/main
```

### Step 4: Verify

```bash
# These should all return no results:
git log --all --full-history -- .env
git log --all --full-history -- .env.development
git log --all --full-history -- knowledge.db
git log --all --full-history -- data/backup/knowledge_backup.db
```

## Prevention

The following safeguards are now in place:

1. **`.gitignore`** — Blocks `.env*`, `*.db`, `*.sqlite`, `*.sqlite3` files
2. **`.env.example`** — Provides a template with placeholder values
3. **This guide** — Documents the cleanup process

### Recommended: Add a Pre-commit Hook

Add this to `.git/hooks/pre-commit` to prevent accidental commits of sensitive files:

```bash
#!/bin/bash
# Prevent committing sensitive files
SENSITIVE_PATTERNS=("\.env$" "\.env\." "\.db$" "\.sqlite" "secret" "password")
STAGED_FILES=$(git diff --cached --name-only)

for file in $STAGED_FILES; do
for pattern in "${SENSITIVE_PATTERNS[@]}"; do
if echo "$file" | grep -qiE "$pattern"; then
echo "❌ Blocked: '$file' matches sensitive pattern '$pattern'"
echo " If intentional, use: git commit --no-verify"
exit 1
fi
done
done
```

## References

- [git-filter-repo documentation](https://github.com/newren/git-filter-repo)
- [GitHub: Removing sensitive data](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository)
- [BFG Repo-Cleaner](https://rtyley.github.io/bfg-repo-cleaner/) (alternative tool)
Binary file removed knowledge.db
Binary file not shown.
126 changes: 126 additions & 0 deletions scripts/clean-git-history.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
#!/usr/bin/env bash
# =============================================================================
# clean-git-history.sh — Remove sensitive files from git history
# =============================================================================
#
# This script uses git-filter-repo to permanently remove files that were
# accidentally committed to git history, including:
# - .env files (containing secrets like JWT_SECRET, SESSION_SECRET, etc.)
# - .db files (SQLite database backups)
#
# ⚠️ IMPORTANT: This rewrites git history. All collaborators must re-clone
# the repository after this is run.
#
# Prerequisites:
# pip install git-filter-repo
#
# Usage:
# 1. Make a FRESH clone of the repo (required by git-filter-repo)
# git clone https://github.com/iamtouchskyer/math-project-server.git math-project-server-clean
# cd math-project-server-clean
#
# 2. Run this script:
# bash scripts/clean-git-history.sh
#
# 3. Verify the cleanup worked:
# git log --all --full-history -- .env # should return nothing
# git log --all --full-history -- knowledge.db # should return nothing
#
# 4. Force-push to GitHub:
# git push origin --force --all
# git push origin --force --tags
#
# 5. ROTATE ALL CREDENTIALS that were exposed:
# - JWT_SECRET
# - SESSION_SECRET
# - Google OAuth credentials
# - Stripe API keys
# - Apple Sign-In configuration
# - WeChat OAuth secrets
# - Admin credentials
#
# =============================================================================

set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

echo -e "${YELLOW}╔══════════════════════════════════════════════════════╗${NC}"
echo -e "${YELLOW}║ Git History Cleanup — Removing Sensitive Files ║${NC}"
echo -e "${YELLOW}╚══════════════════════════════════════════════════════╝${NC}"
echo ""

# Check prerequisites
if ! command -v git-filter-repo &> /dev/null; then
echo -e "${RED}Error: git-filter-repo is not installed.${NC}"
echo "Install it with: pip install git-filter-repo"
exit 1
fi

# Verify we're in a git repo
if ! git rev-parse --is-inside-work-tree &> /dev/null; then
echo -e "${RED}Error: Not in a git repository.${NC}"
exit 1
fi

# Show what will be removed
echo -e "${YELLOW}Files to remove from history:${NC}"
echo ""

FILES_TO_REMOVE=(
".env"
".env.development"
".env.local"
"knowledge.db"
"data/backup/knowledge_backup.db"
"data/backup/knowledge_detailed_description_2025-04-21T00-33-06-807Z.db"
)

for file in "${FILES_TO_REMOVE[@]}"; do
COMMITS=$(git log --all --oneline -- "$file" 2>/dev/null | wc -l)
if [ "$COMMITS" -gt 0 ]; then
echo -e " ${RED}✗${NC} $file (found in $COMMITS commits)"
else
echo -e " ${GREEN}✓${NC} $file (not found — already clean)"
fi
done

echo ""
echo -e "${YELLOW}⚠️ This will PERMANENTLY rewrite git history.${NC}"
echo -e "${YELLOW} All collaborators must re-clone after this.${NC}"
echo ""
read -p "Continue? (yes/no): " CONFIRM

if [ "$CONFIRM" != "yes" ]; then
echo "Aborted."
exit 0
fi

echo ""
echo -e "${GREEN}Running git-filter-repo...${NC}"

# Build the filter-repo arguments
ARGS=""
for file in "${FILES_TO_REMOVE[@]}"; do
ARGS="$ARGS --path $file"
done

# Run git-filter-repo to remove the files
git filter-repo --invert-paths $ARGS --force

echo ""
echo -e "${GREEN}╔══════════════════════════════════════════════════════╗${NC}"
echo -e "${GREEN}║ ✅ Cleanup complete! ║${NC}"
echo -e "${GREEN}╚══════════════════════════════════════════════════════╝${NC}"
echo ""
echo -e "Next steps:"
echo -e " 1. Verify: ${YELLOW}git log --all --full-history -- .env${NC}"
echo -e " 2. Re-add remote: ${YELLOW}git remote add origin https://github.com/iamtouchskyer/math-project-server.git${NC}"
echo -e " 3. Force-push: ${YELLOW}git push origin --force --all && git push origin --force --tags${NC}"
echo -e " 4. ${RED}ROTATE ALL EXPOSED CREDENTIALS${NC}"
echo -e " 5. Tell collaborators to re-clone"
echo ""