Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 42 additions & 1 deletion docs/ORCHESTRATION-GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@
6. [Common Patterns](#common-patterns)
7. [Anti-Patterns (What to Avoid)](#anti-patterns)
8. [Complete Examples](#complete-examples)
9. [Testing Your Orchestrations](#testing)
9. [Orchestration Versioning](#orchestration-versioning)
10. [Testing Your Orchestrations](#testing)

---

Expand Down Expand Up @@ -2046,6 +2047,46 @@ async fn typed_workflow(

---

## Orchestration Versioning

When running long-lived orchestrations in production, you'll need to evolve their logic over time while instances are still running. Duroxide supports orchestration versioning to handle this safely.

### Key Concepts

**Version Registration:**
```rust
// Default version (1.0.0)
OrchestrationRegistry::builder()
.register_typed(MY_ORCHESTRATION, my_orchestration)

// Explicit versions
.register_versioned_typed(MY_ORCHESTRATION, "1.0.1", my_orchestration_v1_0_1)
.register_versioned_typed(MY_ORCHESTRATION, "1.0.2", my_orchestration_v1_0_2)
.build();
```

**Version Upgrade Timing (Critical):**
Version upgrades happen at `continue_as_new` time, **not** when the server restarts. If an old version has a timer running, it must complete before the upgrade occurs.

```
T+0:00 Server restarts with v1.0.2 registered
But v1.0.1 was mid-cycle with ~1 min left on its timer
T+1:00 v1.0.1's timer expires, completes its work
T+1:01 v1.0.1 calls continue_as_new() → resolves to v1.0.2
Database updated: orchestration_version = "1.0.2"
```

**Best Practices:**
- Keep the orchestration NAME constant stable across all versions
- Create separate functions per version: `my_orch()`, `my_orch_v1_0_1()`, `my_orch_v1_0_2()`
- Add version prefix to trace logs: `ctx.trace_info("[v1.0.2] Starting cycle...")`
- Never remove old version registrations while instances might still be running them

📚 **See [production-patterns.md](production-patterns.md) for comprehensive patterns including code organization, common scenarios, and a complete best practices checklist.**

---


## Debugging Tips

### 1. Use Trace Statements
Expand Down
293 changes: 293 additions & 0 deletions docs/production-patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,293 @@
# Orchestration Versioning Best Practices

This guide covers practical patterns for managing versioned orchestrations in production, particularly for long-running workflows that use `continue_as_new`.

## Table of Contents

1. [Code Organization Pattern](#code-organization-pattern)
2. [Registration Pattern](#registration-pattern)
3. [Version Upgrade Timing](#version-upgrade-timing)
4. [Best Practices Checklist](#best-practices-checklist)
5. [Version Info Flow](#version-info-flow)
6. [Common Scenarios](#common-scenarios)

---

## Code Organization Pattern

Keep the orchestration NAME constant stable across all versions. Create separate functions for each version with a version suffix:

```rust
// names.rs - Keep NAME constant stable
pub const MY_ORCHESTRATION: &str = "my-app::orchestration::my-orch";

// my_orchestration.rs
use duroxide::OrchestrationContext;

/// Version 1.0.0 - Original implementation
pub async fn my_orchestration(
ctx: OrchestrationContext,
input: String
) -> Result<String, String> {
ctx.trace_info("Starting...");
// Original logic
ctx.schedule_timer(Duration::from_secs(60)).into_timer().await;
ctx.continue_as_new(input).await
}

/// Version 1.0.1 - Added version prefix to traces for debugging
pub async fn my_orchestration_1_0_1(
ctx: OrchestrationContext,
input: String
) -> Result<String, String> {
ctx.trace_info("[v1.0.1] Starting...");
// Same logic with improved observability
ctx.schedule_timer(Duration::from_secs(60)).into_timer().await;
ctx.continue_as_new(input).await
}

/// Version 1.0.2 - Increased timer interval
pub async fn my_orchestration_1_0_2(
ctx: OrchestrationContext,
input: String
) -> Result<String, String> {
ctx.trace_info("[v1.0.2] Starting...");
// Updated logic with longer interval
ctx.schedule_timer(Duration::from_secs(300)).into_timer().await;
ctx.continue_as_new(input).await
}
```

**Why this pattern works:**
- The NAME constant provides stable identity for the orchestration
- Separate functions allow different implementations without breaking replay
- Version suffixes make code navigation easy (`my_orch_1_0_1`, `my_orch_1_0_2`)
- Trace prefixes like `[v1.0.2]` make log analysis straightforward

---

## Registration Pattern

Register all versions in your registry, with the default version using `register_typed` and explicit versions using `register_versioned_typed`:

```rust
use crate::orchestrations::{
MY_ORCHESTRATION,
my_orchestration,
my_orchestration_1_0_1,
my_orchestration_1_0_2,
};

let orchestrations = OrchestrationRegistry::builder()
// Default version (1.0.0) - use register_typed
.register_typed(MY_ORCHESTRATION, my_orchestration)

// Explicit versions - use register_versioned_typed
.register_versioned_typed(MY_ORCHESTRATION, "1.0.1", my_orchestration_1_0_1)
.register_versioned_typed(MY_ORCHESTRATION, "1.0.2", my_orchestration_1_0_2)

.build();
```

**Key points:**
- `register_typed()` registers at version `1.0.0` by default
- `register_versioned_typed()` registers at the specified semver version
- The default version policy is `Latest` - new starts and `continue_as_new` use the highest version
- Use `set_version_policy()` to pin new instances to a specific version if needed

---

## Version Upgrade Timing

**Critical concept:** Version upgrades happen at `continue_as_new` time, NOT when the server restarts.

This explains why you might see a delay between deploying a new version and seeing it reflected in your UI/monitoring:

```
Timeline example:

T+0:00 Server restarts with v1.0.2 registered
But v1.0.1 was mid-cycle with ~1 min left on its timer

T+1:00 v1.0.1's timer expires, orchestration does its work

T+1:01 v1.0.1 calls continue_as_new()
→ Duroxide resolves "Latest" policy → v1.0.2
→ Database updated: orchestration_version = "1.0.2"
→ New execution starts with v1.0.2 code

T+1:02 UI/API refreshes → now shows v1.0.2
```

**Why this matters:**
- Duroxide doesn't interrupt running orchestrations
- The version policy (`Latest`) is evaluated when a new execution starts
- If you have a long timer (e.g., 5 minutes), the old version runs until that timer expires
- This is intentional - it ensures orchestration determinism and clean execution boundaries

---

## Best Practices Checklist

### ✅ DO

1. **Keep NAME constants stable** across all versions
```rust
pub const MY_ORCH: &str = "myapp::orchestration::my-orch"; // Never changes
```

2. **Use version suffix on function names**
```rust
my_orch() // v1.0.0
my_orch_1_0_1() // v1.0.1
my_orch_1_0_2() // v1.0.2
```

3. **Add version prefix to trace logs** for easier debugging
```rust
ctx.trace_info("[v1.0.2] Starting health check cycle");
ctx.trace_info("[v1.0.2] Completed iteration, continuing...");
```

4. **Document version changes** in code comments
```rust
/// Version 1.0.2 - Changes from 1.0.1:
/// - Increased timer interval from 2 min to 5 min
/// - Added retry logic for health checks
```

5. **Test version transitions** in development before production
- Deploy new version
- Verify old version completes its cycle
- Verify new version starts after `continue_as_new`

### ❌ DON'T

1. **Don't change the NAME constant** when creating new versions
2. **Don't expect immediate version switch** after server restart
3. **Don't use `continue_as_new_versioned()`** unless you specifically need to pin to a version
4. **Don't delete old version code** until all instances have upgraded

---

## Version Info Flow

Understanding how version information flows through the system:

```
Duroxide Runtime
┌─────────────────────────────────────────────────┐
│ Provider (instances table) │
│ ├── orchestration_name: "my-app::my-orch" │
│ └── orchestration_version: "1.0.2" │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ Client.get_instance_info() │
│ └── Returns InstanceInfo { │
│ orchestration_version: "1.0.2" │
│ } │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ Your Backend API │
│ └── /api/orchestrations/:id │
│ { "version": "1.0.2", ... } │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ Your UI / Dashboard │
│ └── Displays: "Version: 1.0.2" │
└─────────────────────────────────────────────────┘
```

To display version in your UI:
1. Call `client.get_instance_info(instance_id)`
2. Access `info.orchestration_version`
3. Include in your API response
4. Display in your dashboard

---

## Common Scenarios

### Scenario 1: Hot-fix deployment

You need to deploy a critical fix to a long-running orchestration:

```rust
// v1.0.1 - Critical fix for health check timeout
pub async fn health_monitor_1_0_1(ctx: OrchestrationContext, input: String) -> Result<String, String> {
ctx.trace_info("[v1.0.1] Starting with increased timeout fix");

// Fixed: Increased timeout from 5s to 30s
let result = ctx.schedule_activity_with_retry(
"HealthCheck",
input.clone(),
RetryPolicy::new(3).with_timeout(Duration::from_secs(30)),
).await?;

ctx.schedule_timer(Duration::from_secs(60)).into_timer().await;
ctx.continue_as_new(input).await
}
```

**What happens:**
1. Deploy server with v1.0.1 registered
2. Running v1.0.0 instances complete their current timer
3. On `continue_as_new`, they automatically upgrade to v1.0.1
4. New fix takes effect without manual intervention

### Scenario 2: Graceful migration with state changes

If you need to transform state between versions:

```rust
// v2.0.0 - New state format
pub async fn my_orch_2_0_0(ctx: OrchestrationContext, input: String) -> Result<String, String> {
ctx.trace_info("[v2.0.0] Starting with new state format");

// Handle migration from v1 state format
let state: MyState = if input.starts_with("{\"v1\":") {
ctx.trace_info("[v2.0.0] Migrating from v1 state format");
migrate_v1_to_v2(&input)?
} else {
serde_json::from_str(&input)?
};

// Continue with v2 logic...
ctx.continue_as_new(serde_json::to_string(&state)?).await
}
```

### Scenario 3: Pinning to exact version

If you need to prevent automatic upgrades (e.g., during testing):

```rust
let orchestrations = OrchestrationRegistry::builder()
.register_typed(MY_ORCH, my_orchestration)
.register_versioned_typed(MY_ORCH, "1.0.1", my_orchestration_1_0_1)
.set_policy(MY_ORCH, VersionPolicy::Exact(Version::parse("1.0.0").unwrap()))
.build();
```

Or use `continue_as_new_versioned` for explicit control:

```rust
// Stay on current version explicitly
ctx.continue_as_new_versioned("1.0.0", input).await
```

---

## See Also

- [Migration Guide](migration-guide.md) - For major version migrations
- [Continue As New Semantics](continue-as-new.md) - Deep dive into CAN behavior
- [ORCHESTRATION-GUIDE.md](ORCHESTRATION-GUIDE.md) - Complete orchestration reference