Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
217 changes: 217 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[codz]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py.cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
# Pipfile.lock

# UV
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# uv.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
# poetry.lock
# poetry.toml

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
# pdm.lock
# pdm.toml
.pdm-python
.pdm-build/

# pixi
# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
# pixi.lock
# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
# in the .venv directory. It is recommended not to include this directory in version control.
.pixi

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# Redis
*.rdb
*.aof
*.pid

# RabbitMQ
mnesia/
rabbitmq/
rabbitmq-data/

# ActiveMQ
activemq-data/

# SageMath parsed files
*.sage.py

# Environments
.env
.envrc
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
# .idea/

# Abstra
# Abstra is an AI-powered process automation framework.
# Ignore directories containing user credentials, local state, and settings.
# Learn more at https://abstra.io/docs
.abstra/

# Visual Studio Code
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
# and can be added to the global gitignore or merged into this file. However, if you prefer,
# you could uncomment the following to ignore the entire vscode folder
# .vscode/

# Ruff stuff:
.ruff_cache/

# PyPI configuration file
.pypirc

# Marimo
marimo/_static/
marimo/_lsp/
__marimo__/

# Streamlit
.streamlit/secrets.toml
.serena/
65 changes: 65 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# CodeContexter

CodeContexter is a command-line tool that walks a source tree, filters files through `.gitignore` rules, and emits a single Markdown report (`code_summary.md`) with file metadata, statistics, and inline source listings.

## Features
- Reuses project ignore rules, combines them with the built-in `ALWAYS_IGNORE_PATTERNS`, and supports nested `.gitignore` files discovered during traversal.
- Detects file language and category using the tables in `codecontexter/constants.py`, then reports totals by both dimensions.
- Shows progress with `tqdm` when available; falls back to a simple iterator when the dependency is missing.
- Optionally appends per-file SHA-256 hashes for audit trails.
- Ships a `codecontexter` console script for Python 3.12+ via the `pyproject.toml` entry point.

## Installation
```bash
pip install codecontexter
# or
uv tool install codecontexter
```

Runtime dependency: `pathspec`. Optional dependency: `tqdm` for progress bars.

## Command-line usage
```bash
codecontexter DIRECTORY \
--output OUTPUT_PATH \
[--verbose] \
[--no-metadata-table] \
[--include-hash]
```

| Option | Description |
| --- | --- |
| `DIRECTORY` | Directory to scan (required positional argument). |
| `-o, --output` | Output Markdown path. Defaults to `code_summary.md`. |
| `-v, --verbose` | Print each processed file with size and line count. |
| `--no-metadata-table` | Skip the per-file overview table. |
| `--include-hash` | Compute SHA-256 for each file before writing the report. |

The CLI reports the resolved project root, the `.gitignore` file in use, and displays progress for scanning and writing when `tqdm` is present.

## Report layout
- **Header** – Repository name, generation timestamp, and source directory path.
- **Statistics** – Total files, total lines, total size, plus breakdowns by category and language.
- **File Metadata table** *(optional)* – File path, size, lines, language, category, and last modification timestamp.
- **Table of Contents** – Links to each file section using GitHub-Flavoured Markdown anchors.
- **File sections** – For each file: language label, size, line count, category, optional SHA-256, and a fenced code block with the exact file contents.

## Configuration points
- Language detection and categorisation live in `codecontexter/constants.py` (`LANGUAGE_MAP` and `FILE_CATEGORIES`). Extend these tables if your project relies on additional file types.
- Permanent ignore rules are defined in `ALWAYS_IGNORE_PATTERNS`. Add directories or patterns there to exclude them globally.
- Hashing is disabled by default; enable `--include-hash` when you need verifiable snapshots.

## Development
```bash
uv sync
source .venv/bin/activate
ruff check
python -m codecontexter.cli .
```

The project currently ships without automated tests. Use the CLI against a fixture project when verifying changes. `pytest` is listed in the `dev` extra for adding coverage.

## Roadmap ideas
- Allow size limits per file to avoid embedding large binaries.
- Offer an HTML writer alongside the Markdown generator.
- Group files by package or module to provide higher-level structure summaries.
Loading