A powerful GitHub repository fork analysis tool that automatically discovers valuable features across all forks of a repository, ranks them by impact, and can create pull requests to integrate the best improvements back to the upstream project.
- Fork Discovery: Automatically finds and catalogs all public forks of a repository
- Feature Analysis: Identifies meaningful changes and improvements in each fork
- Smart Ranking: Scores features based on code quality, community engagement, and impact
- Report Generation: Creates comprehensive markdown reports with feature summaries
- Automated PRs: Can automatically create pull requests for high-value features
- Caching: Intelligent caching system to avoid redundant API calls
- Python 3.12 or higher
- uv package manager
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or with pip
pip install uv# Install with pip
pip install forkscout
# Or with uv
uv add forkscout# Clone the repository
git clone https://github.com/Romamo/forkscout.git
cd forkscout
# Install dependencies
uv sync
# Install in development mode
uv pip install -e .-
Set up your GitHub token:
cp .env.example .env # Edit .env and add your GitHub token -
Analyze a repository:
uv run forkscout analyze https://github.com/pallets/click
-
Generate a report:
uv run forkscout analyze https://github.com/psf/requests --output report.md
-
Auto-create PRs for high-value features:
uv run forkscout analyze https://github.com/Textualize/rich --auto-pr --min-score 80
Create a forkscout.yaml configuration file:
github:
token: ${GITHUB_TOKEN}
scoring:
code_quality_weight: 0.3
community_engagement_weight: 0.2
test_coverage_weight: 0.2
documentation_weight: 0.15
recency_weight: 0.15
analysis:
min_score_threshold: 70.0
max_forks_to_analyze: 100
excluded_file_patterns:
- "*.md"
- "*.txt"
- ".github/*"
# Commit counting configuration
commit_count:
max_count_limit: 100 # Maximum commits to count per fork (0 = unlimited)
display_limit: 5 # Maximum commits to show in display
use_unlimited_counting: false # Enable unlimited counting by default
timeout_seconds: 30 # Timeout for commit counting operations
cache:
duration_hours: 24
max_size_mb: 100forkscout analyze https://github.com/pallets/click# Show all forks with compact commit status
forkscout show-forks https://github.com/psf/requests
# Show forks with recent commits in a separate column
forkscout show-forks https://github.com/Textualize/rich --show-commits 3
# Show detailed fork information with exact commit counts
forkscout show-forks https://github.com/pytest-dev/pytest --detail# Basic exact commit counting (default: count up to 100 commits)
forkscout show-forks https://github.com/newmarcel/KeepingYouAwake --detail
# Unlimited commit counting for maximum accuracy (slower)
forkscout show-forks https://github.com/aarigs/pandas-ta --detail --max-commits-count 0
# Fast processing with lower commit limit
forkscout show-forks https://github.com/NoMore201/googleplay-api --detail --max-commits-count 50
# Custom display limit for commit messages
forkscout show-forks https://github.com/sanila2007/youtube-bot-telegram --show-commits 3 --commit-display-limit 10
# Focus on active forks only
forkscout show-forks https://github.com/maliayas/github-network-ninja --detail --ahead-onlyThe fork tables display commit status in a compact "+X -Y" format:
+5 -2means 5 commits ahead, 2 commits behind+3means 3 commits ahead, up-to-date-1means 1 commit behind, no new commits- Empty cell means completely up-to-date
Unknownmeans status could not be determined
forkscout analyze https://github.com/virattt/ai-hedge-fund --config my-config.yamlforkscout analyze https://github.com/xgboosted/pandas-ta-classic --auto-pr --min-score 85forkscout analyze https://github.com/pallets/click --verboseCommit counts showing "+1" for all forks:
- This was a bug in earlier versions. Update to the latest version.
- Use
--detailflag for accurate commit counting.
Slow performance with commit counting:
- Use
--max-commits-count 50for faster processing - Limit forks with
--max-forks 25 - Use
--ahead-onlyto skip inactive forks
"Unknown" commit counts:
- Usually indicates private/deleted forks or API rate limiting
- Check GitHub token configuration
- Try with
--verbosefor detailed error information
For comprehensive troubleshooting, see docs/COMMIT_COUNTING_TROUBLESHOOTING.md.
# Clone and setup
git clone https://github.com/Romamo/forkscout.git
cd forkscout
uv sync --dev
# Install pre-commit hooks
uv run pre-commit install# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=src --cov-report=html
# Run only unit tests
uv run pytest tests/unit/
# Run only integration tests
uv run pytest tests/integration/# Format code
uv run black src/ tests/
# Lint code
uv run ruff check src/ tests/
# Type checking
uv run mypy src/Forkscout uses a sophisticated evaluation system to analyze commits and determine their value for the main repository. This section explains how the system makes decisions about commit categorization, impact assessment, and value determination.
The system categorizes each commit into one of the following types based on commit message patterns and file changes:
π Feature - New functionality or enhancements
- Message patterns:
feat:,feature,implement,new,add,introduce,create,build,support for,enable - Examples:
feat: add user authentication systemimplement OAuth2 login flowadd support for PostgreSQL database
π Bugfix - Error corrections and issue resolutions
- Message patterns:
fix:,bug,patch,hotfix,repair,resolve,correct,address,issue,problem,error - Examples:
fix: resolve memory leak in data processingcorrect validation error in user inputpatch security vulnerability in auth module
π§ Refactor - Code improvements without functional changes
- Message patterns:
refactor:,clean,improve,restructure,reorganize,simplify,extract,rename,move - Examples:
refactor: extract common validation logicimprove code organization in user modulesimplify database connection handling
π Documentation - Documentation updates and improvements
- Message patterns:
docs:,documentation,readme,comment,comments,docstring,guide,tutorial,example - File patterns:
README.*,*.md,*.rst,docs/,*.txt - Examples:
docs: update installation instructionsadd API documentation for user endpointsimprove code comments in core modules
π§ͺ Test - Test additions and improvements
- Message patterns:
test:,tests,testing,spec,unittest,pytest,coverage,mock,fixture,assert - File patterns:
test_*.py,*_test.py,tests/,*.test.js,*.spec.js - Examples:
test: add unit tests for user serviceimprove test coverage for authenticationadd integration tests for API endpoints
π¨ Chore - Maintenance and build-related changes
- Message patterns:
chore:,maintenance,upgrade,dependency,dependencies,version,config,configuration,setup - File patterns:
requirements.txt,package.json,pyproject.toml,setup.py,Dockerfile,.github/,.gitignore - Examples:
chore: update dependencies to latest versionsupgrade Python to 3.12configure CI/CD pipeline
β‘ Performance - Performance optimizations
- Message patterns:
perf:,performance,speed,fast,optimize,optimization,efficient,cache,caching,memory - Examples:
perf: optimize database query performanceimprove memory usage in data processingadd caching layer for API responses
π Security - Security-related changes
- Message patterns:
security:,secure,vulnerability,auth,authentication,authorization,encrypt,decrypt,hash - File patterns:
*auth*.py,*security*.py,*crypto*.py - Examples:
security: fix SQL injection vulnerabilityimplement secure password hashingadd rate limiting to API endpoints
β Other - Changes that don't fit standard categories
- Used when commit patterns don't match any specific category
- Often indicates complex or unclear changes
The system evaluates the potential impact of each commit using multiple factors:
Files are assessed for criticality based on their role in the project:
π΄ Critical Files (Score: 1.0)
- Core application files:
main.py,index.js,app.py,server.py - Entry points:
__init__.py,setup.py,pyproject.toml,package.json - Files explicitly listed in project's critical files
π High Criticality (Score: 0.8-0.9)
- Security files:
*auth*.py,*security*.py,*crypto*.py,*permission*.py - Configuration files:
config.*,settings.*,.env*,Dockerfile,docker-compose.yml
π‘ Medium-High Criticality (Score: 0.7)
- Database/model files:
*model*.py,*schema*.py,*migration*.py,*database*.py
π’ Medium Criticality (Score: 0.6)
- API/interface files:
*api*.py,*endpoint*.py,*route*.py,*controller*.py
π΅ Low Criticality (Score: 0.1-0.2)
- Test files:
test_*.py,*_test.py,tests/,*.test.js,*.spec.js - Documentation:
README.*,*.md,*.rst,docs/
The system calculates change magnitude based on:
- Lines changed: Additions + deletions (weighted 70%)
- Files changed: Number of modified files (weighted 30%)
- Size bonuses: Large changes (>500 lines) get 1.5x multiplier, medium changes (>200 lines) get 1.2x multiplier
Test Coverage Factor
- Measures proportion of test files in the change
- Bonus points for including any test files
- Score: 0.0 (no tests) to 1.0 (comprehensive test coverage)
Documentation Factor
- Measures proportion of documentation files
- Bonus points for including any documentation
- Score: 0.0 (no docs) to 1.0 (comprehensive documentation)
Code Organization Factor
- Evaluates focus and coherence of changes
- Bonus for focused changes (β€3 files)
- Penalty for scattered changes (>10 files)
- Considers average changes per file
Commit Quality Factor
- Message length and descriptiveness
- Conventional commit format bonus
- Penalty for merge commits
The system combines all factors to determine overall impact:
- π΄ Critical (Score β₯ 0.8): Major changes to critical files with high quality
- π High (Score β₯ 0.6): Significant changes to important files
- π‘ Medium (Score β₯ 0.3): Moderate changes with reasonable scope
- π’ Low (Score < 0.3): Minor changes or low-impact files
The system determines whether each commit could be valuable for the main repository:
Automatic "Yes" Categories:
- Bugfixes: Error corrections benefit all users
- Security fixes: Critical for all installations
- Performance improvements: Speed benefits everyone
- Documentation: Helps all users understand the project
- Tests: Improve reliability for everyone
Conditional "Yes" Examples:
- Features: Substantial new functionality (>50 lines changed)
- Refactoring: Significant code improvements
- Dependency updates: Security or compatibility improvements
Example "Yes" Commits:
β
fix: resolve memory leak in data processing loop
β
security: patch SQL injection vulnerability in user queries
β
perf: optimize database connection pooling (40% faster)
β
feat: add comprehensive input validation system
β
docs: add troubleshooting guide for common errors
β
test: add integration tests for payment processing
Typical "No" Scenarios:
- Fork-specific configurations or customizations
- Environment-specific changes
- Personal preferences or styling
- Changes that break compatibility
- Experimental or incomplete features
Example "No" Commits:
β chore: update personal development environment setup
β feat: add company-specific branding and logos
β config: change database from PostgreSQL to MongoDB for our use case
β style: reformat code according to personal preferences
β feat: add integration with internal company API
Typical "Unclear" Scenarios:
- Small features that might be too specific
- Refactoring without clear benefits
- Complex changes that do multiple things
- Changes with insufficient context
- Experimental or unfinished work
Example "Unclear" Commits:
β refactor: minor code cleanup in utility functions
β feat: add small convenience method for date formatting
β fix: workaround for edge case in specific environment
β update: misc changes and improvements
β feat: experimental feature for advanced users
1. Check commit message for conventional commit prefix (feat:, fix:, etc.)
ββ If found β Use prefix category with high confidence (0.9)
ββ If not found β Continue to pattern matching
2. Analyze commit message for category keywords
ββ Multiple matches β Use highest priority match
ββ No matches β Continue to file analysis
3. Analyze changed files for category patterns
ββ Strong file pattern match (>80% files) β Use file category
ββ Weak or mixed patterns β Continue to combination logic
4. Combine message and file analysis
ββ Message and files agree β Boost confidence (+0.2)
ββ Message confidence > File confidence β Use message category
ββ File confidence > Message confidence β Use file category
ββ Equal confidence β Default to message category or OTHER
1. Calculate Change Magnitude
ββ Count lines changed (additions + deletions)
ββ Count files changed
ββ Apply size multipliers for large changes
2. Assess File Criticality
ββ Check against critical file patterns
ββ Calculate weighted average by change size
ββ Return criticality score (0.0 to 1.0)
3. Evaluate Quality Factors
ββ Test coverage: Proportion of test files
ββ Documentation: Proportion of doc files
ββ Code organization: Focus and coherence
ββ Commit quality: Message and format quality
4. Determine Impact Level
ββ Combine: 40% magnitude + 40% criticality + 20% quality
ββ Score β₯ 0.8 β Critical
ββ Score β₯ 0.6 β High
ββ Score β₯ 0.3 β Medium
ββ Score < 0.3 β Low
1. Check Category Type
ββ Bugfix/Security/Performance β Automatic "Yes"
ββ Docs/Test β Automatic "Yes"
ββ Feature/Refactor/Chore β Continue evaluation
2. Analyze Change Scope
ββ Substantial changes (>50 lines) β Likely "Yes"
ββ Small changes (<20 lines) β Likely "Unclear"
ββ Medium changes β Continue evaluation
3. Check for Fork-Specific Indicators
ββ Personal/company-specific terms β "No"
ββ Environment-specific configs β "No"
ββ Generic improvements β Continue evaluation
4. Final Assessment
ββ Clear benefit to all users β "Yes"
ββ Clearly fork-specific β "No"
ββ Uncertain or context-dependent β "Unclear"
Possible reasons:
- Commit message doesn't match known patterns
- Mixed file types that don't clearly indicate category
- Generic or unclear commit message
Solutions:
- Use conventional commit format:
feat:,fix:,docs:, etc. - Write descriptive commit messages with clear action words
- Focus commits on single types of changes
Common causes:
- Changes affect low-criticality files (tests, docs)
- Small change magnitude (few lines/files changed)
- Poor commit quality (short message, merge commit)
- Low quality factors (no tests or docs included)
To increase impact:
- Include changes to core application files
- Add tests and documentation with your changes
- Write descriptive commit messages
- Make focused, substantial changes
Typical reasons:
- Feature appears too specific or niche
- Insufficient context to determine general usefulness
- Small or experimental change
- Complex commit that does multiple things
To improve assessment:
- Write clear commit messages explaining the benefit
- Include documentation explaining the feature
- Make focused commits that do one thing well
- Consider if the feature would help other users
Possible issues:
- Commit message doesn't include security keywords
- Files don't match security patterns
- Change appears as refactoring or other category
Improvements:
- Use security-related keywords:
security,vulnerability,auth,secure - Use conventional commit format:
security: fix vulnerability in... - Include security-related files in the change
Common causes:
- Files don't match documentation patterns
- Commit message uses maintenance-related words
- Mixed changes including config files
Solutions:
- Use doc-specific keywords:
docs,documentation,readme - Focus commits on documentation files only
- Use conventional commit format:
docs: update installation guide
When using the --explain flag, you'll see structured output with clear separation between factual descriptions and system assessments:
π Description: Added user authentication middleware to handle JWT tokens
βοΈ Assessment: Value for main repo: YES
Category: π Feature | Impact: π΄ High
Reasoning: Large changes affecting critical security files with test coverage
Key sections:
- π Description: Factual description of what changed
- βοΈ Assessment: System's evaluation and judgment
- Category: Determined commit type with confidence
- Impact: Assessed impact level with reasoning
- Value: Whether this could help the main repository
This separation helps you distinguish between objective facts about the commit and the system's subjective assessment of its value.
The system uses consistent visual indicators to help you quickly scan results:
Category Icons:
- π Feature - New functionality
- π Bugfix - Error corrections
- π§ Refactor - Code improvements
- π Documentation - Docs and guides
- π§ͺ Test - Testing improvements
- π¨ Chore - Maintenance tasks
- β‘ Performance - Speed optimizations
- π Security - Security fixes
- β Other - Uncategorized changes
Impact Level Colors:
- π΄ Critical - Major system changes
- π High - Significant improvements
- π‘ Medium - Moderate changes
- π’ Low - Minor modifications
Value Assessment:
- β Yes - Valuable for main repository
- β No - Fork-specific only
- β Unclear - Needs further review
Complexity Indicators:
β οΈ Complex commits that do multiple things are flagged for careful review- Simple, focused commits are preferred for easier integration
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for your changes
- Ensure all tests pass (
uv run pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.