Feature: Enable Unbounded (Streaming) Mode for BufferedStorageBackend with GCS Datastore

<html><head></head><body><h1>Feature: Enable Unbounded (Streaming) Mode for BufferedStorageBackend with GCS Datastore</h1>
<h2>Summary</h2>
<p>Add support for <strong>unbounded mode</strong> across all export commands when using the <code>BufferedStorageBackend</code> with the GCS ledger metadata datastore. Currently, unbounded mode is documented but marked as <strong>"Currently Unsupported"</strong> in the README. This feature will allow stellar-etl to run as a long-lived streaming process that continuously exports new ledgers as they close on the Stellar network, without requiring an <code>--end-ledger</code> flag.</p>
<h2>Motivation</h2>
<p>Today, every export command (<code>export_ledgers</code>, <code>export_transactions</code>, <code>export_operations</code>, <code>export_effects</code>, <code>export_assets</code>, <code>export_trades</code>, <code>export_diagnostic_events</code>, <code>export_ledger_entry_changes</code>) requires both <code>--start-ledger</code> and <code>--end-ledger</code> to define a bounded range. This means:</p>
<ul>
<li><strong>No continuous streaming</strong>: Users must repeatedly launch new stellar-etl processes with updated ledger ranges, adding operational overhead and introducing gaps or latency.</li>
<li><strong>Wasted infrastructure cycles</strong>: Each bounded invocation incurs startup/teardown costs (container spin-up, datastore connection, buffer warming).</li>
<li><strong>Near-real-time pipelines are difficult to build</strong>: Downstream consumers (BigQuery, Kafka, custom pipelines) cannot receive data as soon as ledgers close without a wrapper orchestrator that polls for the latest ledger and re-invokes stellar-etl.</li>
</ul>
<p>The upstream Go SDK (<code>github.com/stellar/go/ingest/ledgerbackend</code>) already supports <code>UnboundedRange</code> on <code>BufferedStorageBackend</code>. The <code>PrepareRange</code> method accepts <code>ledgerbackend.UnboundedRange(startLedger)</code>, and <code>GetLedger</code> will block and wait for the next sequentially written ledger file in the datastore. The CDP <code>ApplyLedgerMetadata</code> helper also supports unbounded ranges. This means the infrastructure to support streaming already exists at the SDK level — stellar-etl just needs to wire it up.</p>
<h2>Current Behavior</h2>
<ul>
<li>The README documents unbounded mode under <code>export_ledger_entry_changes</code> as <strong>"Unbounded (Currently Unsupported)"</strong>.</li>
<li>When only <code>--start-ledger</code> is provided (no <code>--end-ledger</code>), the ETL either errors out or does not function as a streaming process for the datastore backend.</li>
<li>Unbounded mode was previously supported when using the <strong>captive-core</strong> backend (<code>--captive-core</code> flag), where Stellar-Core connects directly to the network. It has not been implemented for the GCS <code>BufferedStorageBackend</code> path.</li>
</ul>
<h2>Proposed Behavior</h2>
<p>When a user provides only <code>--start-ledger</code> (and omits <code>--end-ledger</code>, or sets <code>--end-ledger 0</code>) <strong>without</strong> the <code>--captive-core</code> flag, stellar-etl should:</p>
<ol>
<li>Initialize the <code>BufferedStorageBackend</code> with the GCS datastore as it does today.</li>
<li>Call <code>PrepareRange</code> with <code>ledgerbackend.UnboundedRange(startLedger)</code> instead of <code>ledgerbackend.BoundedRange(startLedger, endLedger)</code>.</li>
<li>Enter a continuous processing loop where <code>GetLedger</code> blocks and waits for the next ledger file to appear in the GCS datastore (written by <a href="https://github.com/stellar/go/blob/master/exp/services/ledgerexporter/README.md">Galexie / Ledger Exporter</a>).</li>
<li>Transform and export each ledger's data as it becomes available, using the same output semantics (batched file output, stdout, etc.).</li>
<li>Continue indefinitely until the process is terminated (SIGINT/SIGTERM), at which point it should gracefully shut down, flush any pending batch, and close the backend.</li>
</ol>
<p>This should apply to <strong>all export commands</strong>, not just <code>export_ledger_entry_changes</code>.</p>
<h2>Scope of Changes</h2>
<h3>1. Backend Initialization (<code>internal/input/</code>)</h3>
<ul>
<li>In the function(s) that create the <code>BufferedStorageBackend</code> and call <code>PrepareRange</code>, detect whether the run is bounded or unbounded based on the presence/absence of <code>--end-ledger</code>.</li>
<li>If unbounded, use <code>ledgerbackend.UnboundedRange(startLedger)</code>.</li>
<li>If bounded, continue using <code>ledgerbackend.BoundedRange(startLedger, endLedger)</code> as today.</li>
</ul>
<h3>2. Export Command Logic (<code>cmd/</code>)</h3>
<ul>
<li>For each <code>export_*.go</code> command, update the ledger processing loop to support indefinite iteration when in unbounded mode.</li>
<li>The loop should call <code>GetLedger(ctx, nextSequence)</code> which will block until that ledger is available in the datastore.</li>
<li>Handle context cancellation and OS signals (SIGINT, SIGTERM) for graceful shutdown.</li>
<li>Continue to export in batches (controlled by <code>--batch-size</code>), flushing each completed batch to output before starting the next.</li>
</ul>
<h3>3. Retry and Resilience</h3>
<ul>
<li>Leverage the existing <code>--retry-limit</code> and <code>--retry-wait</code> flags for transient GCS read failures.</li>
<li>Consider adding a <code>--max-wait</code> or similar flag to control how long the process waits for a new ledger before logging a warning (optional, for observability — the default behavior should be to wait indefinitely as the SDK does).</li>
</ul>
<h3>4. Graceful Shutdown</h3>
<ul>
<li>Register OS signal handlers (SIGINT, SIGTERM).</li>
<li>On signal, cancel the context passed to <code>GetLedger</code>, flush any in-progress batch to output, and call <code>backend.Close()</code>.</li>
<li>Exit with code 0 on clean shutdown.</li>
</ul>
<h3>5. README / Documentation</h3>
<ul>
<li>Update the "Unbounded (Currently Unsupported)" section to reflect that unbounded mode is now supported for the datastore backend.</li>
<li>Add usage examples for unbounded mode, e.g.:
<pre><code class="language-bash"># Stream all ledger data starting from ledger 57000000stellar-etl export_ledgers --start-ledger 57000000 --output streamed_ledgers/# Stream ledger entry changes continuouslystellar-etl export_ledger_entry_changes --start-ledger 57000000 \  --output streamed_changes/ --batch-size 64
</code></pre></li>
<li>Document the graceful shutdown behavior.</li>
</ul>
<h3>6. Tests</h3>
<ul>
<li>Add unit tests for the unbounded range detection logic.</li>
<li>Add integration tests that simulate the datastore having ledgers written incrementally and verify the ETL processes them in order.</li>
<li>Test graceful shutdown behavior (context cancellation mid-stream).</li>
</ul>
<h2>CLI Interface</h2>
<p>No new flags are strictly required. The existing convention is:</p>

Flags Provided | Mode
-- | --
--start-ledger + --end-ledger | Bounded (existing behavior)
--start-ledger only (no --end-ledger or --end-ledger 0) | Unbounded / Streaming (new)


<p>Optional new flag (nice-to-have):</p>
<ul>
<li><code>--unbounded-idle-timeout</code> (duration): If no new ledger appears in the datastore within this duration, log a warning. Default: no timeout (wait forever).</li>
</ul>
<h2>Reference Implementation</h2>
<p>The upstream Go SDK CDP example demonstrates unbounded streaming with <code>BufferedStorageBackend</code>:</p>
<pre><code class="language-go">// From Stellar CDP consumer pipeline sample
latestNetworkLedger, err := historyArchive.GetLatestLedgerSequence()
ledgerRange := ledgerbackend.UnboundedRange(latestNetworkLedger)

pubConfig := cdp.PublisherConfig{
    DataStoreConfig:       adapter.dataStoreConfig,
    BufferedStorageConfig: cdp.DefaultBufferedStorageBackendConfig(
        adapter.dataStoreConfig.Schema.LedgersPerFile,
    ),
}

cdp.ApplyLedgerMetadata(ledgerRange, pubConfig, ctx, callback)
</code></pre>
<p>The <code>BufferedStorageBackend.GetLedger()</code> call will block until the requested ledger sequence is available in the GCS bucket, making it naturally suitable for streaming.</p>
<h2>Acceptance Criteria</h2>
<ul>
<li>[ ] All export commands (<code>export_ledgers</code>, <code>export_transactions</code>, <code>export_operations</code>, <code>export_effects</code>, <code>export_assets</code>, <code>export_trades</code>, <code>export_diagnostic_events</code>, <code>export_ledger_entry_changes</code>) support unbounded mode when using the datastore backend.</li>
<li>[ ] Providing only <code>--start-ledger</code> (without <code>--end-ledger</code>) starts the ETL in unbounded/streaming mode.</li>
<li>[ ] The process continuously exports new ledger data as it appears in the GCS datastore.</li>
<li>[ ] Batch output semantics are preserved (files are flushed per <code>--batch-size</code> ledgers).</li>
<li>[ ] Graceful shutdown on SIGINT/SIGTERM: in-progress batch is flushed, backend is closed, exit code 0.</li>
<li>[ ] Existing bounded mode behavior is unaffected (no regressions).</li>
<li>[ ] Unit and integration tests cover unbounded mode initialization, streaming, and shutdown.</li>
<li>[ ] README updated to remove "Currently Unsupported" label and document unbounded mode usage.</li>
</ul>
<h2>Labels</h2>
<p><code>enhancement</code>, <code>feature</code></p></body></html># Feature: Enable Unbounded (Streaming) Mode for BufferedStorageBackend with GCS Datastore

## Summary

Add support for **unbounded mode** across all export commands when using the `BufferedStorageBackend` with the GCS ledger metadata datastore. Currently, unbounded mode is documented but marked as **"Currently Unsupported"** in the README. This feature will allow stellar-etl to run as a long-lived streaming process that continuously exports new ledgers as they close on the Stellar network, without requiring an `--end-ledger` flag.

## Motivation

Today, every export command (`export_ledgers`, `export_transactions`, `export_operations`, `export_effects`, `export_assets`, `export_trades`, `export_diagnostic_events`, `export_ledger_entry_changes`) requires both `--start-ledger` and `--end-ledger` to define a bounded range. This means:

- **No continuous streaming**: Users must repeatedly launch new stellar-etl processes with updated ledger ranges, adding operational overhead and introducing gaps or latency.
- **Wasted infrastructure cycles**: Each bounded invocation incurs startup/teardown costs (container spin-up, datastore connection, buffer warming).
- **Near-real-time pipelines are difficult to build**: Downstream consumers (BigQuery, Kafka, custom pipelines) cannot receive data as soon as ledgers close without a wrapper orchestrator that polls for the latest ledger and re-invokes stellar-etl.

The upstream Go SDK (`github.com/stellar/go/ingest/ledgerbackend`) already supports `UnboundedRange` on `BufferedStorageBackend`. The `PrepareRange` method accepts `ledgerbackend.UnboundedRange(startLedger)`, and `GetLedger` will block and wait for the next sequentially written ledger file in the datastore. The CDP `ApplyLedgerMetadata` helper also supports unbounded ranges. This means the infrastructure to support streaming already exists at the SDK level — stellar-etl just needs to wire it up.

## Current Behavior

- The README documents unbounded mode under `export_ledger_entry_changes` as **"Unbounded (Currently Unsupported)"**.
- When only `--start-ledger` is provided (no `--end-ledger`), the ETL either errors out or does not function as a streaming process for the datastore backend.
- Unbounded mode was previously supported when using the **captive-core** backend (`--captive-core` flag), where Stellar-Core connects directly to the network. It has not been implemented for the GCS `BufferedStorageBackend` path.

## Proposed Behavior

When a user provides only `--start-ledger` (and omits `--end-ledger`, or sets `--end-ledger 0`) **without** the `--captive-core` flag, stellar-etl should:

1. Initialize the `BufferedStorageBackend` with the GCS datastore as it does today.
2. Call `PrepareRange` with `ledgerbackend.UnboundedRange(startLedger)` instead of `ledgerbackend.BoundedRange(startLedger, endLedger)`.
3. Enter a continuous processing loop where `GetLedger` blocks and waits for the next ledger file to appear in the GCS datastore (written by [[Galexie / Ledger Exporter](https://github.com/stellar/go/blob/master/exp/services/ledgerexporter/README.md)](https://github.com/stellar/go/blob/master/exp/services/ledgerexporter/README.md)).
4. Transform and export each ledger's data as it becomes available, using the same output semantics (batched file output, stdout, etc.).
5. Continue indefinitely until the process is terminated (SIGINT/SIGTERM), at which point it should gracefully shut down, flush any pending batch, and close the backend.

This should apply to **all export commands**, not just `export_ledger_entry_changes`.

## Scope of Changes

### 1. Backend Initialization (`internal/input/`)

- In the function(s) that create the `BufferedStorageBackend` and call `PrepareRange`, detect whether the run is bounded or unbounded based on the presence/absence of `--end-ledger`.
- If unbounded, use `ledgerbackend.UnboundedRange(startLedger)`.
- If bounded, continue using `ledgerbackend.BoundedRange(startLedger, endLedger)` as today.

### 2. Export Command Logic (`cmd/`)

- For each `export_*.go` command, update the ledger processing loop to support indefinite iteration when in unbounded mode.
- The loop should call `GetLedger(ctx, nextSequence)` which will block until that ledger is available in the datastore.
- Handle context cancellation and OS signals (SIGINT, SIGTERM) for graceful shutdown.
- Continue to export in batches (controlled by `--batch-size`), flushing each completed batch to output before starting the next.

### 3. Retry and Resilience

- Leverage the existing `--retry-limit` and `--retry-wait` flags for transient GCS read failures.
- Consider adding a `--max-wait` or similar flag to control how long the process waits for a new ledger before logging a warning (optional, for observability — the default behavior should be to wait indefinitely as the SDK does).

### 4. Graceful Shutdown

- Register OS signal handlers (SIGINT, SIGTERM).
- On signal, cancel the context passed to `GetLedger`, flush any in-progress batch to output, and call `backend.Close()`.
- Exit with code 0 on clean shutdown.

### 5. README / Documentation

- Update the "Unbounded (Currently Unsupported)" section to reflect that unbounded mode is now supported for the datastore backend.
- Add usage examples for unbounded mode, e.g.:
  ```bash
  # Stream all ledger data starting from ledger 57000000
  stellar-etl export_ledgers --start-ledger 57000000 --output streamed_ledgers/
  
  # Stream ledger entry changes continuously
  stellar-etl export_ledger_entry_changes --start-ledger 57000000 \
    --output streamed_changes/ --batch-size 64
  ```
- Document the graceful shutdown behavior.

### 6. Tests

- Add unit tests for the unbounded range detection logic.
- Add integration tests that simulate the datastore having ledgers written incrementally and verify the ETL processes them in order.
- Test graceful shutdown behavior (context cancellation mid-stream).

## CLI Interface

No new flags are strictly required. The existing convention is:

| Flags Provided | Mode |
|---|---|
| `--start-ledger` + `--end-ledger` | Bounded (existing behavior) |
| `--start-ledger` only (no `--end-ledger` or `--end-ledger 0`) | **Unbounded / Streaming (new)** |

Optional new flag (nice-to-have):
- `--unbounded-idle-timeout` (duration): If no new ledger appears in the datastore within this duration, log a warning. Default: no timeout (wait forever).

## Reference Implementation

The upstream Go SDK CDP example demonstrates unbounded streaming with `BufferedStorageBackend`:

```go
// From Stellar CDP consumer pipeline sample
latestNetworkLedger, err := historyArchive.GetLatestLedgerSequence()
ledgerRange := ledgerbackend.UnboundedRange(latestNetworkLedger)

pubConfig := cdp.PublisherConfig{
    DataStoreConfig:       adapter.dataStoreConfig,
    BufferedStorageConfig: cdp.DefaultBufferedStorageBackendConfig(
        adapter.dataStoreConfig.Schema.LedgersPerFile,
    ),
}

cdp.ApplyLedgerMetadata(ledgerRange, pubConfig, ctx, callback)
```

The `BufferedStorageBackend.GetLedger()` call will block until the requested ledger sequence is available in the GCS bucket, making it naturally suitable for streaming.

## Acceptance Criteria

- [ ] All export commands (`export_ledgers`, `export_transactions`, `export_operations`, `export_effects`, `export_assets`, `export_trades`, `export_diagnostic_events`, `export_ledger_entry_changes`) support unbounded mode when using the datastore backend.
- [ ] Providing only `--start-ledger` (without `--end-ledger`) starts the ETL in unbounded/streaming mode.
- [ ] The process continuously exports new ledger data as it appears in the GCS datastore.
- [ ] Batch output semantics are preserved (files are flushed per `--batch-size` ledgers).
- [ ] Graceful shutdown on SIGINT/SIGTERM: in-progress batch is flushed, backend is closed, exit code 0.
- [ ] Existing bounded mode behavior is unaffected (no regressions).
- [ ] Unit and integration tests cover unbounded mode initialization, streaming, and shutdown.
- [ ] README updated to remove "Currently Unsupported" label and document unbounded mode usage.

## Labels

`enhancement`, `feature`

Flags Provided	Mode
--start-ledger + --end-ledger	Bounded (existing behavior)
--start-ledger only (no --end-ledger or --end-ledger 0)	Unbounded / Streaming (new)

Flags Provided	Mode
`--start-ledger` + `--end-ledger`	Bounded (existing behavior)
`--start-ledger` only (no `--end-ledger` or `--end-ledger 0`)	Unbounded / Streaming (new)

Feature: Enable Unbounded (Streaming) Mode for BufferedStorageBackend with GCS Datastore #377

Description

Feature: Enable Unbounded (Streaming) Mode for BufferedStorageBackend with GCS Datastore

Summary

Motivation

Current Behavior

Proposed Behavior

Scope of Changes

1. Backend Initialization (internal/input/)

2. Export Command Logic (cmd/)

3. Retry and Resilience

4. Graceful Shutdown

5. README / Documentation

6. Tests

CLI Interface

Reference Implementation

Acceptance Criteria

Labels

Summary

Motivation

Current Behavior

Proposed Behavior

Scope of Changes

1. Backend Initialization (internal/input/)

2. Export Command Logic (cmd/)

3. Retry and Resilience

4. Graceful Shutdown

5. README / Documentation

6. Tests

CLI Interface

Reference Implementation

Acceptance Criteria

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Backend Initialization (`internal/input/`)

2. Export Command Logic (`cmd/`)

1. Backend Initialization (`internal/input/`)

2. Export Command Logic (`cmd/`)