ccontext

ccontext (collect-context) is a cross-platform utility designed to streamline the process of gathering and sending the context of a directory to large language models (LLMs) like ChatGPT-4o. Our mission is to make collecting and sending context to an LLM as easy as possible.

🚀 Demo: Witness ccontext in Action! 🎥

⚠️ Warning: You May Be Amazed! 🤯

ccontext_demo.mov

Features

Features

🌟 Easy Setup: Quick installation and configuration.
🌍 Cross-Platform Support: Supports Windows, macOS, and Linux.
💾 Binary File Support: Handle various binary files including PDFs, Word documents, images, audio, and video files.
📄 Markdown and PDF Generation: Generate detailed Markdown and PDF files of the directory structure and file contents.
🌐 Crawling of (documentation) Sites: Crawl and gather data from multiple sites using a specified list of URLs.
✂️ Tokenization and Chunking: Automatically handles tokenization and chunking to stay within LLM token limits.
🔧 Configurable Exclusions and Inclusions: Flexibly specify which files and directories to include or exclude.
🗣️ Verbose Output: Optional verbose mode for detailed output and debugging.
📝 Prompt Templates (Upcoming): Create and use custom templates for different types of prompts.

Installation

Using pipx (Recommended)

We recommend installing ccontext using pipx. pipx is a tool that lets you install and run Python applications in isolated environments, ensuring clean installation and easy management of CLI applications.

First, install pipx if you haven't already:

# On macOS
brew install pipx
pipx ensurepath

# On Ubuntu/Debian
sudo apt install pipx
pipx ensurepath

# On Windows
python -m pip install --user pipx
python -m pipx ensurepath
# or read https://pipx.pypa.io/stable/installation/#on-windows

Install ccontext using pipx:
```
pipx install ccontext
```

Why use pipx?

Isolated Environment: Each application runs in its own virtual environment
No Dependency Conflicts: Avoids conflicts with other Python packages
Easy Updates: Simple command to upgrade (pipx upgrade ccontext)
Clean Uninstallation: Remove everything with one command (pipx uninstall ccontext)
Global Access: Installed applications are available system-wide

Alternative: Installing from Source

If you prefer to install from source:

Clone the repository:

git clone https://github.com/oxillix/ccontext.git
cd ccontext

Set up a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install dependencies:
```
pip install -r requirements.txt
```
Install the package:
```
pip install .
```

Usage

Basic Usage

Run ccontext in the folder to ccollect with default settings defined in ~/.ccontext/config.json:
```
ccontext
```

Specify a root path, exclusions, and inclusions:

ccontext -p /path/to/directory -e ".git|node_modules" -i "important_file.txt|docs"

Command-Line Arguments

-h, --help: Show help message.
-p, --root_path: The root path to start the directory tree (default: current directory).
-e, --excludes: Additional files or directories to exclude, separated by |, e.g., node_modules|.git.
-i, --includes: Files or directories to include, separated by |, e.g., important_file.txt|docs.
-m, --max_tokens: Maximum number of tokens allowed before chunking.
-c, --config: Path to a custom configuration file.
-v, --verbose: Enable verbose output to stdout.
-ig, --ignore_gitignore: Ignore the .gitignore file for exclusions.
-g, --generate-pdf: Generate a PDF of the directory tree and file contents.
-gm, --generate-md: Generate a Markdown file of the directory tree and file contents.
--crawl: Crawls the sites specified in the config.

Example

ccontext -p /home/user/project -e ".git|build" -i "README.md|src"

Configuration

Configuration File Location

ccontext looks for configuration in the following order:

Custom config file specified via -c argument
.ccontext-config.json in the current directory
- If present, ccontext will automatically detect and use this local configuration file
- Create this file in the same directory where you run the ccontext command
~/.ccontext/config.json (default user configuration)

Configuration Options

{
  "verbose": false, // Enable detailed output
  "max_tokens": 115000, // Maximum tokens before chunking
  "model_type": "gpt-4o", // LLM model type for tokenization
  "buffer_size": 0.05, // Token buffer size (0-1)

  // System prompt for LLM context
  "context_prompt": "[[SYSTEM INSTRUCTIONS]] The following output represents...",

  // Web crawler configuration
  "urls_to_crawl": [
    {
      "url": "https://www.django-rest-framework.org/",
      "match": ["https://www.django-rest-framework.org/**"],
      "exclude": ["https://www.django-rest-framework.org/community/**"],
      "selector": "",
      "maxPagesToCrawl": 100,
      "outputFileName": "django-rest-framework.org.json",
      "maxTokens": 10000000
    }
  ],

  // Files/folders to explicitly include
  "included_folders_files": [],

  // Files/folders to exclude (supports glob patterns)
  "excluded_folders_files": [
    "**/.git",
    "**/bin",
    "**/build",
    "**/node_modules/**",
    "**/venv",
    "**/__pycache__",
    "**/package-lock.json",
    "**/ccontext.egg-info",
    "**/dist",
    "**/__tests__",
    "**/coverage",
    "**/.next",
    "**/pnpm-lock.yaml",
    "**/poetry.lock",
    "**/ccontext-output.pdf",
    "**/ccontext-output.md",
    "**/*.phpstorm.meta.php",
    "**/*.min.js",
    "**/composer.lock",
    "**/*.lock",
    "**/vendor",
    "**/laravel_access.log",
    "**/*.DS_Store",
    "**/*.tox"
  ],

  // File extensions that can be uploaded to LLMs
  "uploadable_extensions": [
    // Documents
    ".pdf",
    ".doc",
    ".docx",
    ".xls",
    ".xlsx",
    ".ppt",
    ".pptx",

    // Images
    ".jpg",
    ".jpeg",
    ".png",
    ".gif",
    ".bmp",
    ".tiff",
    ".webp",
    ".heic",

    // Audio
    ".mp3",
    ".wav",
    ".ogg",
    ".flac",
    ".aac",
    ".m4a",

    // Video
    ".mp4",
    ".mkv",
    ".avi",
    ".mov",
    ".wmv",
    ".webm",

    // Archives
    ".zip",
    ".rar",
    ".7z",
    ".tar",
    ".gz",

    // Binary/System
    ".exe",
    ".dll",
    ".iso",
    ".dmg",
    ".bin",
    ".dat",
    ".apk",
    ".img",
    ".so",
    ".swf",
    ".psd"
  ]
}

Understanding Glob Patterns

ccontext uses the wcmatch library for glob pattern matching, which gives you powerful but easy-to-use file matching capabilities. Here's a simple guide to using glob patterns:

Important Wildcards Explained:

* (single star): Matches anything in the current folder only

"*.txt"      # Matches: a.txt, b.txt  (in current folder)
"*.txt"      # Won't match: sub/a.txt, deep/sub/b.txt

** (double star): Matches any number of folders

"**/temp"    # Matches: temp, sub/temp, deep/sub/temp
"**/temp"    # Won't match: temp/file.txt

**/* (double star slash star): Matches everything in all folders

"**/*.txt"   # Matches: a.txt, sub/b.txt, very/deep/c.txt
"**/*"       # Matches everything, everywhere

? matches any single character
.txt matches exact file extension

Simple Examples:

{
  "excluded_folders_files": [
    // Basic matching
    "temp.txt", // Matches exact file temp.txt
    "*.txt", // Matches all .txt files in root folder
    "**/*.txt", // Matches all .txt files in any folder

    // Folder matching
    "temp/*", // Matches everything in temp folder
    "**/temp", // Matches temp folder anywhere
    "**/temp/**", // Matches everything in any temp folder

    // Common use cases
    "**/node_modules", // Matches node_modules folders anywhere
    "**/__pycache__", // Matches Python cache folders
    "**/*.pyc", // Matches Python compiled files
    "build/*" // Matches everything in build folder
  ]
}

Tips for Beginners:
- Start simple! Use *.ext for file extensions
- Use **/ when you want to match in any folder
- Test your patterns with a small folder first
- When in doubt, be more specific
- Remember, patterns are case-sensitive

The glob system is very forgiving - if you make a mistake, it usually just won't match anything rather than causing errors. Feel free to experiment!

Configuration Options Explained

Option	Description	Default
verbose	Enable detailed output	false
max_tokens	Maximum tokens before chunking	115000
model_type	LLM model type for tokenization	"gpt-4o"
buffer_size	Token buffer size (0-1)	0.05
excluded_folders_files	Glob patterns for exclusion	[".git", ...]
included_folders_files	Glob patterns for inclusion	[]
uploadable_extensions	File extensions to upload	[".pdf", ...]

Binary File Handling

ccontext supports handling binary files through the uploadable_extensions configuration.

Supported Binary Files

Documents: .pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx
Images: .jpg, .jpeg, .png, .gif, .bmp, .tiff, .webp, .heic
Audio: .mp3, .wav, .ogg, .flac, .aac, .m4a
Video: .mp4, .mkv, .avi, .mov, .wmv, .webm
Archives: .zip, .rar, .7z, .tar, .gz
Binary/System: .exe, .dll, .iso, .dmg, .bin, .dat, .apk, .img, .so, .swf, .psd

Binary File Processing

Binary files matching uploadable_extensions are prepared for upload to LLMs
File references are automatically copied to clipboard
Most LLM providers limit maximum of X binary files per prompt
Rate limits may apply based on your LLM provider

Example configuration for handling specific file types:

{
  "uploadable_extensions": [".pdf", ".jpg", ".png", ".xlsx"]
}

Document Crawling

The crawling feature allows you to gather documentation from websites for context.

Crawler Configuration

{
  "urls_to_crawl": [
    {
      "url": "https://docs.example.com",
      "match": ["https://docs.example.com/**"],
      "exclude": ["https://docs.example.com/internal/**"],
      "selector": "",
      "maxPagesToCrawl": 100,
      "outputFileName": "docs.json",
      "maxTokens": 2000000
    }
  ]
}

Crawler Options

url: Starting URL for crawling
match: Glob patterns for URLs to include
exclude: Glob patterns for URLs to exclude
selector: CSS selector for content extraction
maxPagesToCrawl: Limit on pages to crawl
outputFileName: Name of output file
maxTokens: Maximum tokens to collect

Best Practices

Use specific match patterns
Respect robots.txt and site policies

Use Cases and Examples

Common Usage Patterns

Analyzing a Python Project

ccontext -p /path/to/project -e "venv|__pycache__|*.pyc"

Processing Documentation

ccontext -p ./docs --crawl -gm

Including Specific Files

ccontext -i "README.md|docs/*|*.py"

Generating PDF and Markdown

ccontext -g -gm  # Generates both PDF and Markdown

Integration Examples

With GitHub Copilot

ccontext -p . -e "node_modules|dist" -i "src/**/*.ts"

**With ChatGPT (webapp has max 32k) **

ccontext -p . --max_tokens 32000

Troubleshooting

Common Issues

Clipboard Issues in SSH
- Issue: Cannot copy to clipboard in SSH session
- Solution:
  - Use SSH with X11 forwarding (ssh -X user@host), test using xeyes
  - On Mac, install XQuartz (brew install --cask xquartz)
Token Limit Exceeded
- Issue: Content too large for LLM
- Solution: Adjust max_tokens or use chunking feature
Binary File Handling
- Issue: Binary files not being processed
- Solution: Check uploadable_extensions configuration

Platform-Specific Issues

Windows: Use WSL if possible!

Otherwise:

Issue: Path separators in configuration
Solution: Use forward slashes or escaped backslashes

Linux

Issue: X11 clipboard support
Solution: Install xclip or xsel

macOS

Issue: Clipboard permissions
Solution: Grant terminal app accessibility permissions

Development Guide

Project Structure

ccontext/
├── ccontext/           # Main package directory
│   ├── __init__.py
│   ├── main.py         # Entry point
│   ├── file_tree.py    # Tree operations
│   └── ...
├── tests/              # Test directory
├── docs/               # Documentation
└── examples/           # Example configurations

Development Setup

Clone the repository
Create a virtual environment
Install development dependencies
Run tests

git clone https://github.com/oxillix/ccontext.git
# or
git clone git@github.com:NicolasArnouts/ccontext.git
cd ccontext
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
pip3 install -e .

Contributing Guidelines

Fork the repository
Create a feature branch
Write tests for new features
Submit a pull request

Code Style

Follow PEP 8 guidelines
use isort and black
Use type hints
Keep functions focused and small

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Thanks to all contributors! 😊
Inspired by the need for better context handling in AI interactions.
Built with love and passion for the developer community! 💖

Feel free to raise issues or contribute to the project. We appreciate your support!

Happy coding adventures! 🚀 Nicolas Arnouts

Looking for a skilled freelancer? I'm available for hire! Let's collaborate — reach out to me at: arnouts.software@gmail.com

Badges

Using in WSL2 Environment

When using ccontext's web crawling feature in WSL2, you can use crawl4ai for reliable web content extraction:

First, set up the WSL2 environment:

python -m ccontext.fix_wsl

Then run the crawler as usual:

python -m ccontext --crawl

The crawler will automatically detect WSL2 and configure the environment appropriately. If you prefer to use the crawler directly:

python -m ccontext.run_crawlers --url https://example.com --output example.md

WSL2 Troubleshooting

If you encounter issues with the crawler in WSL2:

Ensure Python and dependencies are properly installed

Try running with explicit parameters:

python -m ccontext.run_crawlers --url https://example.com --output example.md --max-pages 10

Check that any security software isn't blocking the network connections
For more detailed logging, add the --verbose flag

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github/workflows		.github/workflows
.vscode		.vscode
ccontext		ccontext
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
README.md		README.md
ideas.md		ideas.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_ccontext.sh		run_ccontext.sh

Folders and files

Latest commit

History

Repository files navigation

ccontext

🚀 Demo: Witness ccontext in Action! 🎥

Features

Table of Contents

Installation

Using pipx (Recommended)

Alternative: Installing from Source

Usage

Basic Usage

Command-Line Arguments

Example

Configuration

Configuration File Location

Configuration Options

Understanding Glob Patterns

Configuration Options Explained

Binary File Handling

Supported Binary Files

Binary File Processing

Document Crawling

Crawler Configuration

Crawler Options

Best Practices

Use Cases and Examples

Common Usage Patterns

Integration Examples

Troubleshooting

Common Issues

Platform-Specific Issues

Windows: Use WSL if possible!

Linux

macOS

Development Guide

Project Structure

Development Setup

Contributing Guidelines

Code Style

License

Acknowledgments

Badges

Using in WSL2 Environment

WSL2 Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages