All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Smart Data File Handling: Data files (
.csv,.json,.jsonl) now support nuanced inclusion strategies:--truncate-data-files/--no-truncate-data-files: master toggle (config:truncate_data_files, defaulttrue).--data-file-threshold-lines <num>: only truncate data files exceeding this many lines (config:data_file_threshold_lines, default50). Small datasets are now included in full.--data-file-max-per-dir <num>: when a directory contains more same-extension data files than this, only the first N are kept and the rest are summarised in anOmitted Data File Groupssection. Set to0to disable. (config:data_file_max_per_dir, default5.) This prevents per-record dataset layouts (50+ near-identical JSON files) from drowning out the rest of the prompt.
- CXML output: file content containing the literal string
</document_content>no longer breaks the surrounding XML envelope; the closing tag is now neutralized inside content blocks. - Directory pruning: skipped directories (e.g.
node_modules,.git) are now pruned at enqueue time during file discovery, avoiding unnecessary work in large monorepos. - GitHub fetcher: errors and rate-limits are now surfaced as warnings instead of being silently swallowed. Added support for
GITHUB_TOKEN/GH_TOKENenvironment variables to raise the API rate limit. - PDF extraction: pages are now joined with double newlines, so the last word of one page no longer runs into the first word of the next.
- YouTube transcripts: now compatible with both the legacy
YouTubeTranscriptApi.get_transcript()and the newer instance-based.fetch()API. - Snapshot diff: unchanged files are no longer re-hashed on every diff. The current-state index now reuses the snapshot's
sha256when bothsizeandmtimematch, drastically speeding up diffs on large repos. - Token-count fallback: snapshot diffs now report
0tokens whentiktokenis unavailable, matching the rest of the tool instead of returning a misleading whitespace approximation.
- File Size Limits: Introduced flags to control the maximum size of individual files included in the prompt:
--file-max-lines <num>: Truncates file content afternumlines.--file-max-bytes <num>: Truncates file content afternumbytes.- This is applied to regular files before compression is attempted, ensuring massive files don't overwhelm the context window. Files truncated by these limits are marked with a note in the final prompt.
- Snapshot Command:
codetoprompt snapshot <PATH> --output <snapshot.json>to save a JSON snapshot of a local project (no git required).--outputis required for this command.
- Diff Command:
codetoprompt diff <PATH> --snapshot <snapshot.json>to show a unified diff between the current project state and a previous snapshot.- By default, the diff is copied to the clipboard when no
--outputis provided (ifpyperclipis available; on Linux ensurexcliporwl-copy). - Providing
--output <file>writes the diff to the given path instead of copying to the clipboard.
- By default, the diff is copied to the clipboard when no
- Configurable Snapshot Thresholds: Set via
codetoprompt config:snapshot_max_bytes(default 3 MB)snapshot_max_lines(default 100,000)
- Diff supports
--use-snapshot-filtersto reuse include/exclude and.gitignorebehavior from the snapshot; you can override with flags.
- Documentation: Overhauled
README.mdfor improved clarity, professionalism, and visual appeal.- Replaced the introductory slogan with a more descriptive summary of the tool's purpose.
- Added a prominent "Key Features" section for a quick overview of capabilities.
- Updated the example outputs for the
analysecommand and the interactive mode (-i) to better reflect current functionality and showcase features. - Added more badges for CI status, Python versions, and license.
- Restructured the feature sections and command-line reference for better readability.
- File Filtering: The
--includeand--excludeflags now correctly handle recursive folder patterns, behaving like.gitignore(e.g.,--exclude "my_folder"or--exclude "my_folder/**"will exclude all contents).
- Single File Input: The CLI no longer errors when given a path to a single file (e.g.,
ctp my_file.py). It now correctly processes the specified file. - Jupyter Notebook Processing: Resolved an issue where processing
.ipynbfiles could cause the tool to hang by including a necessary dependency (ipython).
- Jupyter Notebook Support: Automatically processes
.ipynbfiles by extracting Python code from all cells and including it in the prompt. This requires thenbformatandnbconvertpackages.
- Tokenization Errors: Fixed a critical bug where the tool would crash if a file's content included text that matched a special
tiktokentoken (e.g.,<|endoftext|>). The tool now safely encodes such text.
- JavaScript Support in Web Scarping: Added Javascript Support that will help to scrape JS Enabled Website.
-
Remote URL Processing:
codetopromptcan now process content directly from the web, in addition to local directories.- GitHub Repositories: Pass a GitHub URL to get a prompt of the entire codebase.
- Web Pages: Fetch and extract the main text from any website or documentation page.
- YouTube Videos: Automatically get the full transcript from a video URL.
- ArXiv Papers & PDFs: Extract text from ArXiv abstract pages or direct PDF links.
-
Documentation: Updated documentation.
- Documentation: Updated documentation.
- Lazy Loading Implemented for Interactive Mode: Implemented Lazy Loading for Interative Mode for Optimizing Interactive Mode on Huge Codebases.
- Library Modularization: Refactored CLI functionality into modular components for improved readability and maintainability.
- Interactive Mode: Introduced a new
--interactive(-i) flag that launches a file selection UI, allowing users to manually choose which files to include in the prompt. This provides fine-grained control over the context.
- Documentation: Improved and updated documentation.
- Code Compressor Bugs: Resolved issues with unsupported languages in the compression process.
- Dataset Detection: Now reads the first 5 lines of dataset files to generate more effective prompts while reducing token usage.
- Code Compressor: Refactored into modular components.
- CLI Alias: You can now invoke the CLI using
ctpas a shorthand forcodetoprompt. - Version Flag: Added
--versionand-vflags to display the current version (e.g.,codetoprompt --versionorctp -v).
- Documentation: Added details for the compression feature and the
--markdown(-m) and--cxml(-c) output modes.
- Code Compression: Introduced a
--compressflag that usestree-sitterto parse supported code files (Python, JS, TS, Java, C/C++, Rust) into a structural summary. This significantly reduces the token count by omitting implementation details while preserving classes, functions, and signatures. - Configurable Compression: The
codetoprompt configwizard now supports setting compression as a default. - Fallback for Unsupported Files: Files in unsupported languages (e.g., Markdown) are included in full.
- Flexible Output Formats: Generate prompts in different formats using new CLI flags:
--markdown(-m): Outputs file contents in fenced Markdown code blocks with language hints.--cxml(-c): Outputs file contents in a Claude-friendly XML structure.
- Configuration Options: The interactive wizard (
codetoprompt config) and config file now support setting a default output format (default,markdown, orcxml).
- Codebase Analysis: New
codetoprompt analyzecommand for in-depth project statistics. - Enhanced Prompt Summary: Includes top files and file types by token count.
- Config Command Validation: The
configcommand now prevents invalid flag usage. - UI Improvements: Panels now correctly size to their content.
- Interactive Configuration Wizard: Accessed via
codetoprompt config. - Config Management: View and reset config with
codetoprompt config --showandcodetoprompt config --reset. - Persistent Configuration File: Stored at
~/.config/codetoprompt/config.toml.
- CLI Default Behavior: Running
codetopromptwith no arguments now shows the help menu.
- Project Structure Tree: A visual tree of the project structure is now included in the prompt.
- CLI Output Improvements: Enhanced progress bar and more user feedback.
- Multiple bugs related to tree generation, depth handling, and relative paths.
- Clipboard and output file handling issues.
- File Processing: More robust handling of special tokens, problematic files, and file read errors.
- CLI Enhancements: Added directory validation and improved argument parsing.
- Packaging & CI: Corrected
pyproject.tomland GitHub Actions workflows. - Multiple bugs in token counting and file handling logic.
- Initial release.
- Core functionality for converting a codebase into a single prompt.
- File filtering, token counting, and a command-line interface.