Skip to content

security: untrack .db files + git history cleanup script#3

Open
kagurachen28-prog wants to merge 1 commit intoiamtouchskyer:mainfrom
kagurachen28-prog:security/clean-tracked-files
Open

security: untrack .db files + git history cleanup script#3
kagurachen28-prog wants to merge 1 commit intoiamtouchskyer:mainfrom
kagurachen28-prog:security/clean-tracked-files

Conversation

@kagurachen28-prog
Copy link

Summary

Follow-up to PR #1 — this addresses the security concern about secrets remaining in git history.

What This PR Does

1. Untracks committed database files

Three .db files were tracked by git despite *.db being in .gitignore (they were added before the rule existed):

File Size Action
knowledge.db 32 KB git rm --cached
data/backup/knowledge_backup.db 7.2 MB git rm --cached
data/backup/knowledge_detailed_description_*.db 14 MB git rm --cached

The files are removed from tracking only — your local copies are preserved.

2. Adds scripts/clean-git-history.sh

An automated script using git-filter-repo that:

  • Removes .env, .env.development, .env.local from all commits
  • Removes knowledge.db and backup .db files from all commits
  • Shows exactly what will be cleaned before running
  • Provides next-step instructions

3. Adds docs/security/GIT_HISTORY_CLEANUP.md

Step-by-step guide covering:

  • How to run the cleanup script
  • Credential rotation checklist (JWT_SECRET, SESSION_SECRET, Google OAuth, Stripe, Apple, WeChat)
  • How to notify collaborators
  • Verification steps
  • Pre-commit hook to prevent future accidents

⚠️ Important: Repo Owner Action Required

This PR alone does not clean git history — that requires force-pushing rewritten history, which only the repo owner can do. The script and guide make the process as easy as possible:

# Fresh clone → run script → force push
git clone https://github.com/iamtouchskyer/math-project-server.git server-clean
cd server-clean
pip install git-filter-repo
bash scripts/clean-git-history.sh
git push origin --force --all

Files Changed

  • knowledge.db — removed from tracking (still in .gitignore)
  • data/backup/*.db — removed from tracking (still in .gitignore)
  • scripts/clean-git-history.shnew automated cleanup script
  • docs/security/GIT_HISTORY_CLEANUP.mdnew step-by-step guide

Related

- Remove knowledge.db from tracking (32KB, was committed before .gitignore rule)
- Remove data/backup/knowledge_backup.db from tracking (7.2MB)
- Remove data/backup/knowledge_detailed_description_*.db from tracking (14MB)
- All .db files already in .gitignore, just need git rm --cached
- Add scripts/clean-git-history.sh for automated git-filter-repo cleanup
- Add docs/security/GIT_HISTORY_CLEANUP.md with step-by-step guide

This addresses the security concern raised in PR iamtouchskyer#1: while .env was removed
from tracking, secrets remain in git history. The cleanup script handles
both .env files AND .db files. The script must be run by the repo owner
since it requires force-pushing rewritten history.

Note: After running the cleanup script, all collaborators must re-clone.
Credential rotation is strongly recommended regardless.
Copy link
Owner

@iamtouchskyer iamtouchskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved

This PR properly addresses security concerns:

  1. Removes tracked database files - knowledge.db and backup databases should not be in version control
  2. Comprehensive documentation - GIT_HISTORY_CLEANUP.md clearly explains the problem and solution
  3. Automated script - clean-git-history.sh is well-written with proper safety checks

Minor suggestions (non-blocking)

  • Consider adding *.sqlite3 to .gitignore as well
  • The cleanup script is thorough but requires manual execution by repo owner - that's appropriate

Good work on this security improvement!

Copy link
Owner

@iamtouchskyer iamtouchskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM

Excellent security-focused PR. Well documented with:

  • Clear explanation of the problem (tracked .db files + secrets in git history)
  • Automated cleanup script with safety checks
  • Comprehensive documentation for credential rotation
  • Step-by-step guide for repo owner action

Highlights:

  • scripts/clean-git-history.sh is well-written with proper checks and user confirmation
  • docs/security/GIT_HISTORY_CLEANUP.md provides excellent guidance
  • Pre-commit hook recommendation is a nice touch

One minor note:
After merging, the repo owner should run the cleanup script and rotate all credentials as documented.

Thanks for addressing this security concern! 🔒

Copy link
Owner

@iamtouchskyer iamtouchskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR properly addresses security concerns by:

  • Removing tracked database files from the repository
  • Providing comprehensive documentation for git history cleanup
  • Including a well-structured cleanup script

✅ Good Practices

  • Clear documentation in docs/security/GIT_HISTORY_CLEANUP.md
  • Script has proper error handling and user confirmation
  • Checklist for credential rotation is thorough

📝 Note

The git history cleanup requires repo owner action after merge. The documentation covers this well.

LGTM — Security best practice implemented correctly.

Copy link
Owner

@iamtouchskyer iamtouchskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Important security cleanup. The git history cleanup script and documentation are thorough. Reminder to rotate all exposed credentials after merging and running the cleanup.

Copy link
Owner

@iamtouchskyer iamtouchskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM! Critical security improvement. The cleanup documentation and script are comprehensive. ⚠️ IMPORTANT: After merging, please execute the git history cleanup script and ROTATE ALL EXPOSED CREDENTIALS (JWT_SECRET, SESSION_SECRET, OAuth keys, etc.) as documented in the PR.

Copy link
Owner

@iamtouchskyer iamtouchskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🔒 Excellent security response. This PR properly addresses credential exposure with comprehensive cleanup documentation and automation. URGENT: Execute the git history cleanup ASAP and rotate all exposed credentials (JWT secrets, OAuth keys, etc.). The exposed surface area is significant - recommend prioritizing this for execution.

Copy link
Owner

@iamtouchskyer iamtouchskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Critical security improvement.

This PR removes sensitive database files from tracking and provides comprehensive documentation for cleaning git history. The automated cleanup script is well-designed with proper safety checks.

IMPORTANT ACTION REQUIRED AFTER MERGE:

  1. Run the git history cleanup script as documented
  2. Rotate ALL exposed credentials (JWT_SECRET, SESSION_SECRET, OAuth keys, etc.)
  3. Force-push cleaned history
  4. Notify collaborators to re-clone

The security guide is thorough and includes prevention measures. Essential fix!

Copy link
Owner

@iamtouchskyer iamtouchskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Excellent security-focused PR that properly addresses accidentally committed database files. The comprehensive cleanup guide, automated script, and credential rotation checklist demonstrate security best practices. Critical fix for data exposure issue.

Copy link
Owner

@iamtouchskyer iamtouchskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 这个安全清理PR非常重要:1) 删除了敏感的.db文件 2) 提供了详细的git历史清理指南 3) 包含了自动化清理脚本。建议按照文档说明尽快执行git历史清理和凭据轮换。

Copy link
Owner

@iamtouchskyer iamtouchskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Critical security fix removing database files from tracking. Excellent documentation in GIT_HISTORY_CLEANUP.md with complete remediation steps and automated script. This should be merged and executed ASAP to clean git history.

@kagurachen28-prog
Copy link
Author

Hi @iamtouchskyer! 👋 Just checking in — this PR has been approved and all feedback addressed. Let me know if there's anything else needed, happy to merge whenever you're ready! 🙂

Copy link
Owner

@iamtouchskyer iamtouchskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 CRITICAL SECURITY FIX - Excellent work!

Removes sensitive database files from tracking and provides comprehensive git history cleanup guide. The automated script is well-written with proper safety checks. Key strengths:

✅ Clear problem identification
✅ Step-by-step remediation guide
✅ Credential rotation checklist
✅ Prevention measures for future
✅ Automated cleanup script with safeguards

ACTION REQUIRED AFTER MERGE: Follow the cleanup guide to purge git history and rotate all exposed credentials (JWT_SECRET, SESSION_SECRET, OAuth secrets, etc.)

Excellent security practices! LGTM! 🔒

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants