Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
[package]
name = "rust-connect"
name = "franzoxide"
version = "0.1.0"
edition = "2021"
description = "A Kafka Connect clone written in Rust with gRPC interface"
description = "Franzoxide: A high-performance Kafka Connect clone written in Rust"
authors = ["Laurent Valdes"]

[dependencies]
Expand Down
60 changes: 60 additions & 0 deletions GAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Franzoxide Gap Analysis

This document identifies the gaps between the current implementation of Franzoxide and the target feature set for S3 sink connectors, focusing on Parquet partitioning and Apache Iceberg integration.

## Feature Gap Analysis

| Priority | Feature | Description | Status | Complexity |
|----------|---------|-------------|--------|------------|
| 1 | Time-Based Partitioning | Implement TimeBasedPartitioner to create Hive-compatible partitions (year/month/day/hour) | Not Started | Medium |
| 2 | Parquet File Generation | Support for writing Parquet files with schema information and compression | Partial | High |
| 3 | S3 Upload Management | Configurable flush strategies and atomic file operations | Partial | Medium |
| 4 | Schema Registry Integration | Support for AWS Glue Schema Registry for schema evolution | Not Started | High |
| 5 | Partition Management | Direct management of partitions without relying on external crawlers | Not Started | Medium |
| 6 | Iceberg Basic Support | Implement basic Apache Iceberg table format support | Not Started | Very High |
| 7 | Exactly-Once Semantics | Ensure records are written exactly once, even during failures | Not Started | High |
| 8 | AWS Glue Catalog Integration | Integration with AWS Glue Data Catalog for table management | Not Started | Medium |
| 9 | Multi-Table Fan-Out | Support routing different records to different tables | Not Started | Medium |
| 10 | Schema Evolution | Automatic handling of schema changes in streaming data | Not Started | High |
| 11 | Commit Coordination | Implement commit coordination through Kafka control topics | Not Started | High |

## Implementation Roadmap

### Phase 1: Basic Parquet Partitioning
- Implement TimeBasedPartitioner
- Complete Parquet file generation with compression options
- Enhance S3 upload management with configurable flush strategies

### Phase 2: Schema Management
- Add AWS Glue Schema Registry integration
- Implement schema evolution capabilities
- Implement direct partition management and registration

### Phase 3: Iceberg Integration
- Implement basic Apache Iceberg table format support
- Add AWS Glue Catalog integration
- Implement commit coordination through Kafka

### Phase 4: Advanced Features
- Add multi-table fan-out capabilities
- Implement exactly-once semantics

## Technical Challenges

1. **Rust Ecosystem Maturity**: Limited Rust libraries for Iceberg compared to Java
2. **Memory Management**: Efficient buffering and memory management in Rust
3. **AWS Integration**: Proper integration with AWS services (S3, Glue) from Rust
4. **Schema Evolution**: Handling complex schema changes in a type-safe language like Rust
5. **Performance Optimization**: Ensuring the Rust implementation outperforms the Java version

## Comparison with Java Kafka Connect

| Feature | Java Kafka Connect | Rust Connect (Current) | Rust Connect (Target) |
|---------|-------------------|------------------------|------------------------|
| Parquet Support | Full | Partial | Full |
| Partitioning Schemes | Multiple | Basic | Multiple |
| Iceberg Support | Available via connector | Not available | Full support |
| Schema Registry | Multiple options | Not implemented | AWS Glue Schema Registry |
| Performance | Good | Better | Significantly better |
| Memory Footprint | High | Low | Low |
| Exactly-Once Semantics | Supported | Not implemented | Supported |
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Rust Connect
# Franzoxide

A Kafka Connect clone written in Rust with gRPC interface.
A high-performance Kafka Connect clone written in Rust.

## Project Description

Rust Connect is a high-performance alternative to Kafka Connect, implemented in Rust. It connects Kafka with S3 storage and aims to provide similar functionality with better performance and resource efficiency.
Franzoxide is a high-performance alternative to Kafka Connect, implemented in Rust. It connects Kafka with S3 storage and aims to provide similar functionality with better performance and resource efficiency. The name combines "Franz" (referencing Franz Kafka) with "oxide" (a nod to Rust, as rust is iron oxide).

## Features

Expand Down
Loading