A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.
-
Updated
Nov 25, 2025 - Python
A Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.
Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.
A lightweight Python library for metadata-rich document chunking in Retrieval-Augmented Generation (RAG) workflows. It leverages Azure AI Document Intelligence to enhance chunking by retaining hierarchical structure, page numbers, and bounding boxes for seamless integration with PDF viewers.
Production-ready Snowflake RAG system with type-specific chunking
"My complete LangChain learning journey — from basics to advanced RAG, LCEL, LangGraph, LangServe, LangSmith with hands-on code examples."
This repository provides a fully modular implementation of a Retrieval-Augmented Generation (RAG) pipeline tailored for Italian legal-domain documents.
Smart text chunking tool for RAG systems. Splits long texts into sentence-based chunks with ~10%-15% overlap for better context retention. Runs fully in-browser with a clean UI and copyable outputs.
📝 Parse, chunk, and evaluate Markdown for RAG pipelines with token-accurate support and flexible strategies for optimal context management.
Add a description, image, and links to the document-chunking topic page so that developers can more easily learn about it.
To associate your repository with the document-chunking topic, visit your repo's landing page and select "manage topics."