affandar · affandar · Jan 5, 2026 · Jan 5, 2026 · Jan 6, 2026
diff --git a/docs/ORCHESTRATION-GUIDE.md b/docs/ORCHESTRATION-GUIDE.md
@@ -15,7 +15,8 @@
 6. [Common Patterns](#common-patterns)
 7. [Anti-Patterns (What to Avoid)](#anti-patterns)
 8. [Complete Examples](#complete-examples)
-9. [Testing Your Orchestrations](#testing)
+9. [Orchestration Versioning](#orchestration-versioning)
+10. [Testing Your Orchestrations](#testing)
 
 ---
 
@@ -2046,6 +2047,46 @@ async fn typed_workflow(
 
 ---
 
+## Orchestration Versioning
+
+When running long-lived orchestrations in production, you'll need to evolve their logic over time while instances are still running. Duroxide supports orchestration versioning to handle this safely.
+
+### Key Concepts
+
+**Version Registration:**
+```rust
+// Default version (1.0.0)
+OrchestrationRegistry::builder()
+    .register_typed(MY_ORCHESTRATION, my_orchestration)
+
+// Explicit versions  
+    .register_versioned_typed(MY_ORCHESTRATION, "1.0.1", my_orchestration_v1_0_1)
+    .register_versioned_typed(MY_ORCHESTRATION, "1.0.2", my_orchestration_v1_0_2)
+    .build();
+```
+
+**Version Upgrade Timing (Critical):**  
+Version upgrades happen at `continue_as_new` time, **not** when the server restarts. If an old version has a timer running, it must complete before the upgrade occurs.
+
+```
+T+0:00  Server restarts with v1.0.2 registered
+        But v1.0.1 was mid-cycle with ~1 min left on its timer
+T+1:00  v1.0.1's timer expires, completes its work
+T+1:01  v1.0.1 calls continue_as_new() → resolves to v1.0.2
+        Database updated: orchestration_version = "1.0.2"
+```
+
+**Best Practices:**
+- Keep the orchestration NAME constant stable across all versions
+- Create separate functions per version: `my_orch()`, `my_orch_v1_0_1()`, `my_orch_v1_0_2()`
+- Add version prefix to trace logs: `ctx.trace_info("[v1.0.2] Starting cycle...")`
+- Never remove old version registrations while instances might still be running them
+
+📚 **See [production-patterns.md](production-patterns.md) for comprehensive patterns including code organization, common scenarios, and a complete best practices checklist.**
+
+---
+
+
 ## Debugging Tips
 
 ### 1. Use Trace Statements

diff --git a/docs/production-patterns.md b/docs/production-patterns.md
@@ -0,0 +1,293 @@
+# Orchestration Versioning Best Practices
+
+This guide covers practical patterns for managing versioned orchestrations in production, particularly for long-running workflows that use `continue_as_new`.
+
+## Table of Contents
+
+1. [Code Organization Pattern](#code-organization-pattern)
+2. [Registration Pattern](#registration-pattern)
+3. [Version Upgrade Timing](#version-upgrade-timing)
+4. [Best Practices Checklist](#best-practices-checklist)
+5. [Version Info Flow](#version-info-flow)
+6. [Common Scenarios](#common-scenarios)
+
+---
+
+## Code Organization Pattern
+
+Keep the orchestration NAME constant stable across all versions. Create separate functions for each version with a version suffix:
+
+```rust
+// names.rs - Keep NAME constant stable
+pub const MY_ORCHESTRATION: &str = "my-app::orchestration::my-orch";
+
+// my_orchestration.rs
+use duroxide::OrchestrationContext;
+
+/// Version 1.0.0 - Original implementation
+pub async fn my_orchestration(
+    ctx: OrchestrationContext, 
+    input: String
+) -> Result<String, String> {
+    ctx.trace_info("Starting...");
+    // Original logic
+    ctx.schedule_timer(Duration::from_secs(60)).into_timer().await;
+    ctx.continue_as_new(input).await
+}
+
+/// Version 1.0.1 - Added version prefix to traces for debugging
+pub async fn my_orchestration_1_0_1(
+    ctx: OrchestrationContext, 
+    input: String
+) -> Result<String, String> {
+    ctx.trace_info("[v1.0.1] Starting...");
+    // Same logic with improved observability
+    ctx.schedule_timer(Duration::from_secs(60)).into_timer().await;
+    ctx.continue_as_new(input).await
+}
+
+/// Version 1.0.2 - Increased timer interval
+pub async fn my_orchestration_1_0_2(
+    ctx: OrchestrationContext, 
+    input: String
+) -> Result<String, String> {
+    ctx.trace_info("[v1.0.2] Starting...");
+    // Updated logic with longer interval
+    ctx.schedule_timer(Duration::from_secs(300)).into_timer().await;
+    ctx.continue_as_new(input).await
+}
+```
+
+**Why this pattern works:**
+- The NAME constant provides stable identity for the orchestration
+- Separate functions allow different implementations without breaking replay
+- Version suffixes make code navigation easy (`my_orch_1_0_1`, `my_orch_1_0_2`)
+- Trace prefixes like `[v1.0.2]` make log analysis straightforward
+
+---
+
+## Registration Pattern
+
+Register all versions in your registry, with the default version using `register_typed` and explicit versions using `register_versioned_typed`:
+
+```rust
+use crate::orchestrations::{
+    MY_ORCHESTRATION,
+    my_orchestration,
+    my_orchestration_1_0_1, 
+    my_orchestration_1_0_2,
+};
+
+let orchestrations = OrchestrationRegistry::builder()
+    // Default version (1.0.0) - use register_typed
+    .register_typed(MY_ORCHESTRATION, my_orchestration)
+
+    // Explicit versions - use register_versioned_typed
+    .register_versioned_typed(MY_ORCHESTRATION, "1.0.1", my_orchestration_1_0_1)
+    .register_versioned_typed(MY_ORCHESTRATION, "1.0.2", my_orchestration_1_0_2)
+
+    .build();
+```
+
+**Key points:**
+- `register_typed()` registers at version `1.0.0` by default
+- `register_versioned_typed()` registers at the specified semver version
+- The default version policy is `Latest` - new starts and `continue_as_new` use the highest version
+- Use `set_version_policy()` to pin new instances to a specific version if needed
+
+---
+
+## Version Upgrade Timing
+
+**Critical concept:** Version upgrades happen at `continue_as_new` time, NOT when the server restarts.
+
+This explains why you might see a delay between deploying a new version and seeing it reflected in your UI/monitoring:
+
+```
+Timeline example:
+
+T+0:00  Server restarts with v1.0.2 registered
+        But v1.0.1 was mid-cycle with ~1 min left on its timer
+
+T+1:00  v1.0.1's timer expires, orchestration does its work
+
+T+1:01  v1.0.1 calls continue_as_new()
+        → Duroxide resolves "Latest" policy → v1.0.2
+        → Database updated: orchestration_version = "1.0.2"
+        → New execution starts with v1.0.2 code
+
+T+1:02  UI/API refreshes → now shows v1.0.2
+```
+
+**Why this matters:**
+- Duroxide doesn't interrupt running orchestrations
+- The version policy (`Latest`) is evaluated when a new execution starts
+- If you have a long timer (e.g., 5 minutes), the old version runs until that timer expires
+- This is intentional - it ensures orchestration determinism and clean execution boundaries
+
+---
+
+## Best Practices Checklist
+
+### ✅ DO
+
+1. **Keep NAME constants stable** across all versions
+   ```rust
+   pub const MY_ORCH: &str = "myapp::orchestration::my-orch"; // Never changes
+   ```
+
+2. **Use version suffix on function names**
+   ```rust
+   my_orch()       // v1.0.0
+   my_orch_1_0_1() // v1.0.1
+   my_orch_1_0_2() // v1.0.2
+   ```
+
+3. **Add version prefix to trace logs** for easier debugging
+   ```rust
+   ctx.trace_info("[v1.0.2] Starting health check cycle");
+   ctx.trace_info("[v1.0.2] Completed iteration, continuing...");
+   ```
+
+4. **Document version changes** in code comments
+   ```rust
+   /// Version 1.0.2 - Changes from 1.0.1:
+   /// - Increased timer interval from 2 min to 5 min
+   /// - Added retry logic for health checks
+   ```
+
+5. **Test version transitions** in development before production
+   - Deploy new version
+   - Verify old version completes its cycle
+   - Verify new version starts after `continue_as_new`
+
+### ❌ DON'T
+
+1. **Don't change the NAME constant** when creating new versions
+2. **Don't expect immediate version switch** after server restart
+3. **Don't use `continue_as_new_versioned()`** unless you specifically need to pin to a version
+4. **Don't delete old version code** until all instances have upgraded
+
+---
+
+## Version Info Flow
+
+Understanding how version information flows through the system:
+
+```
+                      Duroxide Runtime
+                            │
+                            ▼
+┌─────────────────────────────────────────────────┐
+│  Provider (instances table)                      │
+│  ├── orchestration_name: "my-app::my-orch"      │
+│  └── orchestration_version: "1.0.2"             │
+└─────────────────────────────────────────────────┘
+                            │
+                            ▼
+┌─────────────────────────────────────────────────┐
+│  Client.get_instance_info()                      │
+│  └── Returns InstanceInfo {                      │
+│        orchestration_version: "1.0.2"           │
+│      }                                           │
+└─────────────────────────────────────────────────┘
+                            │
+                            ▼
+┌─────────────────────────────────────────────────┐
+│  Your Backend API                                │
+│  └── /api/orchestrations/:id                    │
+│      { "version": "1.0.2", ... }                │
+└─────────────────────────────────────────────────┘
+                            │
+                            ▼
+┌─────────────────────────────────────────────────┐
+│  Your UI / Dashboard                             │
+│  └── Displays: "Version: 1.0.2"                 │
+└─────────────────────────────────────────────────┘
+```
+
+To display version in your UI:
+1. Call `client.get_instance_info(instance_id)` 
+2. Access `info.orchestration_version`
+3. Include in your API response
+4. Display in your dashboard
+
+---
+
+## Common Scenarios
+
+### Scenario 1: Hot-fix deployment
+
+You need to deploy a critical fix to a long-running orchestration:
+
+```rust
+// v1.0.1 - Critical fix for health check timeout
+pub async fn health_monitor_1_0_1(ctx: OrchestrationContext, input: String) -> Result<String, String> {
+    ctx.trace_info("[v1.0.1] Starting with increased timeout fix");
+
+    // Fixed: Increased timeout from 5s to 30s
+    let result = ctx.schedule_activity_with_retry(
+        "HealthCheck",
+        input.clone(),
+        RetryPolicy::new(3).with_timeout(Duration::from_secs(30)),
+    ).await?;
+
+    ctx.schedule_timer(Duration::from_secs(60)).into_timer().await;
+    ctx.continue_as_new(input).await
+}
+```
+
+**What happens:**
+1. Deploy server with v1.0.1 registered
+2. Running v1.0.0 instances complete their current timer
+3. On `continue_as_new`, they automatically upgrade to v1.0.1
+4. New fix takes effect without manual intervention
+
+### Scenario 2: Graceful migration with state changes
+
+If you need to transform state between versions:
+
+```rust
+// v2.0.0 - New state format
+pub async fn my_orch_2_0_0(ctx: OrchestrationContext, input: String) -> Result<String, String> {
+    ctx.trace_info("[v2.0.0] Starting with new state format");
+
+    // Handle migration from v1 state format
+    let state: MyState = if input.starts_with("{\"v1\":") {
+        ctx.trace_info("[v2.0.0] Migrating from v1 state format");
+        migrate_v1_to_v2(&input)?
+    } else {
+        serde_json::from_str(&input)?
+    };
+
+    // Continue with v2 logic...
+    ctx.continue_as_new(serde_json::to_string(&state)?).await
+}
+```
+
+### Scenario 3: Pinning to exact version
+
+If you need to prevent automatic upgrades (e.g., during testing):
+
+```rust
+let orchestrations = OrchestrationRegistry::builder()
+    .register_typed(MY_ORCH, my_orchestration)
+    .register_versioned_typed(MY_ORCH, "1.0.1", my_orchestration_1_0_1)
+    .set_policy(MY_ORCH, VersionPolicy::Exact(Version::parse("1.0.0").unwrap()))
+    .build();
+```
+
+Or use `continue_as_new_versioned` for explicit control:
+
+```rust
+// Stay on current version explicitly
+ctx.continue_as_new_versioned("1.0.0", input).await
+```
+
+---
+
+## See Also
+
+- [Migration Guide](migration-guide.md) - For major version migrations
+- [Continue As New Semantics](continue-as-new.md) - Deep dive into CAN behavior
+- [ORCHESTRATION-GUIDE.md](ORCHESTRATION-GUIDE.md) - Complete orchestration reference