A collection of polyglot example applications demonstrating how to use the Databricks Zerobus SDK across different programming languages.
Zerobus is Databricks' streaming ingestion service that allows you to ingest data into Unity Catalog tables using Protocol Buffers over gRPC. This repository contains practical examples to help you get started.
For comprehensive documentation, see the official Databricks Zerobus Ingest documentation.
| Example | Language | Description |
|---|---|---|
| hello-world | Rust | Basic example demonstrating the fundamental workflow of the Zerobus SDK, including SDK initialization, stream creation, message encoding, record ingestion, and graceful shutdown. |
| aws-lambda-sqs-ingestor | Rust | AWS Lambda function that processes SQS messages and ingests them into Unity Catalog tables via Zerobus. Includes Terraform infrastructure for deployment with SQS queue, Dead Letter Queue, and Lambda function configured for partial batch response. |
| aws-generic-ingestor | Rust | Generic AWS Lambda function that can ingest events from any AWS service (API Gateway, EventBridge, S3, SNS, etc.) into Unity Catalog tables via Zerobus. Stores event payloads and Lambda context as JSON strings, making it suitable for centralized logging and event auditing. |
- Rust 1.70 or later
- A Databricks workspace with Zerobus enabled (contact your Databricks account representative if needed)
- Service principal with OAuth credentials (client ID and client secret)
- A Unity Catalog table configured for Zerobus ingestion
- Buf or the Protocol Buffer compiler (
protoc) for compiling protobuf bindings
git clone git@github.com:zcking/zerobus-examples.git
cd zerobus-examplesCopy the example environment file and fill in your actual values:
cp .env.example .envEdit .env with your specific configuration and credentials.
Export the environment variables from .env in your shell:
export $(grep -v '^#' .env | grep -v '^$' | xargs)Creating a service principal:
- In your Databricks workspace, go to Settings > Identity and Access
- Create a new service principal and generate credentials
- Grant the required permissions for your target table:
GRANT USE CATALOG ON CATALOG <catalog> TO `<service-principal-uuid>`; GRANT USE SCHEMA ON SCHEMA <catalog.schema> TO `<service-principal-uuid>`; GRANT MODIFY, SELECT ON TABLE <catalog.schema.table> TO `<service-principal-uuid>`;
The zerobus-generate tool is used to generate Protocol Buffer schemas from Unity Catalog tables. Install it globally:
# Clone the Zerobus Rust SDK repository
git clone https://github.com/databricks/zerobus-sdk-rs.git
cd zerobus-sdk-rs/tools/generate_files
# Build the tool
cargo build --release
# Copy to your cargo bin directory (makes it available globally)
cp target/release/tools ~/.cargo/bin/zerobus-generate
# Verify installation
zerobus-generate --helpGenerate Protocol Buffer schemas from your Unity Catalog table using the zerobus-generate command:
# Set environment variables (or export from .env file)
export DATABRICKS_HOST="https://myworkspace.cloud.databricks.com"
export DATABRICKS_CLIENT_ID="your-client-id"
export DATABRICKS_CLIENT_SECRET="your-client-secret"
export TABLE_NAME="main.default.zerobus_hello_world"
# Generate .proto file, Rust code, and descriptor from the Unity Catalog table
zerobus-generate \
--uc-endpoint $DATABRICKS_HOST \
--client-id $DATABRICKS_CLIENT_ID \
--client-secret $DATABRICKS_CLIENT_SECRET \
--table $TABLE_NAME \
--output-dir hello-world/protoNote: With the buf-based workflow, zerobus-generate only creates the .proto file. The Rust bindings and descriptor files are generated by buf during the build process.
zerobus-examples/
├── Cargo.toml # Rust workspace configuration
├── README.md # This file
├── .env.example # Example environment configuration
├── hello-world/ # Rust: Basic hello world example
│ ├── Cargo.toml
│ ├── buf.yaml
│ ├── buf.gen.yaml
│ ├── Makefile
│ ├── proto/
│ ├── gen/ # Generated code (gitignored)
│ └── src/
├── aws-lambda-sqs-ingestor/ # Rust: AWS Lambda SQS ingestor
│ ├── Cargo.toml
│ ├── buf.yaml
│ ├── terraform/
│ └── ...
└── aws-generic-ingestor/ # Rust: AWS Lambda generic ingestor
├── Cargo.toml
├── buf.yaml
├── terraform/
└── ...
Each example is self-contained with its own dependencies, proto files, and README. Future examples in other languages (Python, Go, etc.) will follow the same pattern.
The SDK requires two endpoints:
- Zerobus Endpoint: The gRPC endpoint for streaming data ingestion (format:
https://<workspace_id>.zerobus.<region>.cloud.databricks.com) - Databricks Host: Your workspace URL used for Unity Catalog authentication and table metadata
The SDK handles OAuth 2.0 authentication automatically using service principal credentials. You only need to provide:
- Client ID: Service principal application ID
- Client Secret: Service principal secret
These credentials are used to obtain and refresh access tokens as needed. The SDK manages token lifecycle internally.
- Create Stream: Opens an authenticated bidirectional gRPC stream to the Zerobus service
- Ingest Records: Send Protocol Buffer encoded data representing table rows
- Acknowledgments: Each record returns a future that resolves when the service acknowledges durability
- Flush: Force pending records to be transmitted (useful before shutdown)
- Close: Gracefully shutdown the stream, ensuring all records are acknowledged
The Zerobus service uses Protocol Buffers for efficient data serialization. Here's the workflow:
- Generate Schema: Use the
zerobus-generatetool (installed globally) to automatically generate Protocol Buffer definitions from your Unity Catalog table schema - Include Generated Code: The tool creates three files:
.proto- Protocol Buffer schema definition.rs- Rust code with message structs.descriptor- Binary descriptor for runtime type information
- Encode and Send: Create instances of your message structs, encode them, and send via the stream
The SDK supports various configuration options via StreamConfigurationOptions:
| Option | Default | Description |
|---|---|---|
max_inflight_records |
50000 | Maximum number of unacknowledged records |
recovery |
true | Enable automatic stream recovery on failures |
recovery_timeout_ms |
15000 | Timeout for recovery operations (ms) |
recovery_backoff_ms |
2000 | Delay between recovery attempts (ms) |
recovery_retries |
3 | Maximum number of recovery attempts |
flush_timeout_ms |
300000 | Timeout for flush operations (ms) |
server_lack_of_ack_timeout_ms |
60000 | Server acknowledgment timeout (ms) |
Authentication errors:
- Verify your service principal credentials are correct
- Ensure the service principal has the required permissions on the target table
- Check that your Databricks host URL is correct
Connection errors:
- Verify your Zerobus endpoint format:
https://<workspace_id>.zerobus.<region>.cloud.databricks.com - Ensure network connectivity to your Databricks workspace
- Check that Zerobus is enabled for your workspace
Schema errors:
- Regenerate your Protocol Buffer files if your table schema has changed
- Ensure the descriptor file matches your current table schema
- Verify field types match between your data and the schema
- Databricks Zerobus Ingest Documentation
- Zerobus Rust SDK Repository
- Databricks Unity Catalog Documentation
- Protocol Buffers Documentation
Apache-2.0