Skip to content

knitli/recoco

Recoco Logo

Recoco

Incremental ETL and Data Processing Framework for Rust

Docs Site Crates.io API Docs CI MSRV License REUSE Compliance


Recoco is a pure Rust fork of the excellent CocoIndex, a high-performance, incremental ETL and data processing framework.

Tip

Full documentation — guides, examples, API reference, and more — is at docs.knitli.com/recoco/.

Why Fork?

I decided to create a Rust-only fork of CocoIndex for a couple reasons:

  1. CocoIndex is not a Rust library. CocoIndex is written in Rust, but it does not expose a Rust API and its packaging, documentation, and examples are only focused on Python. It exposes a more limited API through its Rust extensions. It's not even released on crates.io.

  2. CocoIndex is heavy. CocoIndex has several very heavy dependencies and unless you are actually Google, you probably don't need all of them. These include large packages like Google/AWS/Azure components, Qdrant/Postgres/Neo4j, and more.

For Knitli, I needed dependency control. I wanted to use CocoIndex as an ETL engine for Thread, but Thread needs to be edge-deployable. The dependencies were way too heavy and would never compile to WASM. Thread, of course, is also a Rust project, so pulling in a lot of Python dependencies didn't make sense for me.

Note

Knitli and Recoco have no official relationship with CocoIndex and they don't endorse this project. We will contribute as much as we can upstream, our contribution guidelines encourage you to submit PRs and issues affecting shared code upstream to help both projects.

How Recoco is Different from CocoIndex

  1. Recoco fully exposes a Rust API. You can use Recoco to support your Rust ETL projects directly. Build on it.

  2. Every target, source, and function (i.e. transform) is independently feature-gated. Use only what you want.

The minimum install now uses 600 fewer crates (820 → 220) — a ~73% reduction from CocoIndex.

We will regularly merge in upstream fixes and changes, particularly sources, targets, and functions.

✨ Key Features

  • 🦀 Pure Rust: No Python dependencies, interpreters, or build tools required
  • 🎯 Modular Architecture: Feature-gated sources, targets, and functions — use only what you need
  • Incremental Processing: Dataflow engine that processes only changed data; tracks lineage automatically
  • 🚀 Additional optimizations: Faster alternatives where possible (e.g., blake2blake3)
  • 📦 Workspace Structure: Clean separation into recoco, recoco-utils, and recoco-splitters crates
  • 🔌 Rich Connector Ecosystem: Local Files, PostgreSQL, S3, Azure, Google Drive, Qdrant, Neo4j, Kùzu, and more
  • 🌐 Async API: Fully async/await compatible, built on Tokio

See the Core Crate Reference for a complete list of available features.

🎯 Use Cases

  • RAG Pipelines: Ingest documents, split intelligently, generate embeddings, store in vector databases
  • ETL Workflows: Extract, transform, and load across multiple data stores
  • Document Processing: Parse, chunk, and extract information from large document collections
  • Data Synchronization: Keep data in sync across systems with automatic change detection
  • Custom Pipelines: Build domain-specific data flows with custom Rust operations

Installation

Add recoco to your Cargo.toml, enabling only the features you need:

[dependencies]
recoco = { version = "0.2", default-features = false, features = ["source-local-file", "function-split"] }

For the full list of available features — sources, targets, functions, LLM providers, splitter languages, and capability bundles — see the Core Crate Reference.

🚀 Quick Start

use recoco::prelude::*;
use recoco::builder::FlowBuilder;
use recoco::execution::evaluator::evaluate_transient_flow;
use serde_json::json;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    recoco::lib_context::init_lib_context(Some(recoco::settings::Settings::default())).await?;

    let mut builder = FlowBuilder::new("hello_world").await?;

    let input = builder.add_direct_input(
        "text".to_string(),
        schema::make_output_type(schema::BasicValueType::Str),
    )?;

    let output = builder.transform(
        "SplitBySeparators".to_string(),
        json!({ "separators_regex": [" "] }).as_object().unwrap().clone(),
        vec![(input, Some("text".to_string()))],
        None,
        "splitter".to_string(),
    ).await?;

    builder.set_direct_output(output)?;

    let flow = builder.build_transient_flow().await?;
    let result = evaluate_transient_flow(
        &flow.0,
        &vec![value::Value::Basic("Hello Recoco".into())]
    ).await?;

    println!("Result: {:?}", result);
    Ok(())
}

For step-by-step guidance, custom operations, file processing, and more, visit the Getting Started guide and Examples.

🗺️ Roadmap

  • WASM Support: Compile core logic to WASM for edge deployment
  • More Connectors: Add support for Redis, ClickHouse, and more
  • UI Dashboard: Simple web UI for monitoring flows
  • Upstream Sync: Regular merges from upstream CocoIndex

🔗 Relationship to CocoIndex

Recoco is a fork of CocoIndex:

Aspect CocoIndex (Upstream) Recoco (Fork)
Primary Language Python with Rust core Pure Rust
API Surface Python-only Full Rust API
Distribution Not on crates.io Published to crates.io
Dependencies All bundled together Feature-gated and modular
Target Audience Python developers Rust developers
License Apache-2.0 Apache-2.0

We aim to maintain compatibility with CocoIndex's core dataflow engine to allow porting upstream improvements, while diverging significantly in the API surface and dependency management to better serve Rust users.

Code headers maintain dual copyright (CocoIndex upstream, Knitli Inc. for Recoco modifications) under Apache-2.0.

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md and the Contributing guide for details.

📄 License

Apache License 2.0; see NOTICE for full license text.

This project is REUSE 3.3 compliant.

🙏 Acknowledgments

Recoco is built on the excellent foundation provided by CocoIndex. We're grateful to the CocoIndex team for creating such a powerful and well-designed dataflow engine.


Built with 🦀 by Knitli Inc.

DocsAPI ReferenceCrates.ioGitHubIssues

About

CocoIndex, but usable from Rust. Modular, feature-gated, edge-deployable. 220 crates instead of 820.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors