Recoco is a pure Rust fork of the excellent CocoIndex, a high-performance, incremental ETL and data processing framework.
Tip
Full documentation — guides, examples, API reference, and more — is at docs.knitli.com/recoco/.
I decided to create a Rust-only fork of CocoIndex for a couple reasons:
-
CocoIndex is not a Rust library. CocoIndex is written in Rust, but it does not expose a Rust API and its packaging, documentation, and examples are only focused on Python. It exposes a more limited API through its Rust extensions. It's not even released on crates.io.
-
CocoIndex is heavy. CocoIndex has several very heavy dependencies and unless you are actually Google, you probably don't need all of them. These include large packages like Google/AWS/Azure components, Qdrant/Postgres/Neo4j, and more.
For Knitli, I needed dependency control. I wanted to use CocoIndex as an ETL engine for Thread, but Thread needs to be edge-deployable. The dependencies were way too heavy and would never compile to WASM. Thread, of course, is also a Rust project, so pulling in a lot of Python dependencies didn't make sense for me.
Note
Knitli and Recoco have no official relationship with CocoIndex and they don't endorse this project. We will contribute as much as we can upstream, our contribution guidelines encourage you to submit PRs and issues affecting shared code upstream to help both projects.
-
Recoco fully exposes a Rust API. You can use Recoco to support your Rust ETL projects directly. Build on it.
-
Every target, source, and function (i.e. transform) is independently feature-gated. Use only what you want.
The minimum install now uses 600 fewer crates (820 → 220) — a ~73% reduction from CocoIndex.
We will regularly merge in upstream fixes and changes, particularly sources, targets, and functions.
- 🦀 Pure Rust: No Python dependencies, interpreters, or build tools required
- 🎯 Modular Architecture: Feature-gated sources, targets, and functions — use only what you need
- ⚡ Incremental Processing: Dataflow engine that processes only changed data; tracks lineage automatically
- 🚀 Additional optimizations: Faster alternatives where possible (e.g.,
blake2→blake3) - 📦 Workspace Structure: Clean separation into
recoco,recoco-utils, andrecoco-splitterscrates - 🔌 Rich Connector Ecosystem: Local Files, PostgreSQL, S3, Azure, Google Drive, Qdrant, Neo4j, Kùzu, and more
- 🌐 Async API: Fully async/await compatible, built on Tokio
See the Core Crate Reference for a complete list of available features.
- RAG Pipelines: Ingest documents, split intelligently, generate embeddings, store in vector databases
- ETL Workflows: Extract, transform, and load across multiple data stores
- Document Processing: Parse, chunk, and extract information from large document collections
- Data Synchronization: Keep data in sync across systems with automatic change detection
- Custom Pipelines: Build domain-specific data flows with custom Rust operations
Add recoco to your Cargo.toml, enabling only the features you need:
[dependencies]
recoco = { version = "0.2", default-features = false, features = ["source-local-file", "function-split"] }For the full list of available features — sources, targets, functions, LLM providers, splitter languages, and capability bundles — see the Core Crate Reference.
use recoco::prelude::*;
use recoco::builder::FlowBuilder;
use recoco::execution::evaluator::evaluate_transient_flow;
use serde_json::json;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
recoco::lib_context::init_lib_context(Some(recoco::settings::Settings::default())).await?;
let mut builder = FlowBuilder::new("hello_world").await?;
let input = builder.add_direct_input(
"text".to_string(),
schema::make_output_type(schema::BasicValueType::Str),
)?;
let output = builder.transform(
"SplitBySeparators".to_string(),
json!({ "separators_regex": [" "] }).as_object().unwrap().clone(),
vec![(input, Some("text".to_string()))],
None,
"splitter".to_string(),
).await?;
builder.set_direct_output(output)?;
let flow = builder.build_transient_flow().await?;
let result = evaluate_transient_flow(
&flow.0,
&vec![value::Value::Basic("Hello Recoco".into())]
).await?;
println!("Result: {:?}", result);
Ok(())
}For step-by-step guidance, custom operations, file processing, and more, visit the Getting Started guide and Examples.
- WASM Support: Compile core logic to WASM for edge deployment
- More Connectors: Add support for Redis, ClickHouse, and more
- UI Dashboard: Simple web UI for monitoring flows
- Upstream Sync: Regular merges from upstream CocoIndex
Recoco is a fork of CocoIndex:
| Aspect | CocoIndex (Upstream) | Recoco (Fork) |
|---|---|---|
| Primary Language | Python with Rust core | Pure Rust |
| API Surface | Python-only | Full Rust API |
| Distribution | Not on crates.io | Published to crates.io |
| Dependencies | All bundled together | Feature-gated and modular |
| Target Audience | Python developers | Rust developers |
| License | Apache-2.0 | Apache-2.0 |
We aim to maintain compatibility with CocoIndex's core dataflow engine to allow porting upstream improvements, while diverging significantly in the API surface and dependency management to better serve Rust users.
Code headers maintain dual copyright (CocoIndex upstream, Knitli Inc. for Recoco modifications) under Apache-2.0.
Contributions are welcome! Please see CONTRIBUTING.md and the Contributing guide for details.
Apache License 2.0; see NOTICE for full license text.
This project is REUSE 3.3 compliant.
Recoco is built on the excellent foundation provided by CocoIndex. We're grateful to the CocoIndex team for creating such a powerful and well-designed dataflow engine.
Built with 🦀 by Knitli Inc.
Docs • API Reference • Crates.io • GitHub • Issues
