Skip to content

feat(sdks): client sandbox pool implementation#393

Open
ninan-nn wants to merge 1 commit intoalibaba:mainfrom
ninan-nn:feature/client_pool_impl
Open

feat(sdks): client sandbox pool implementation#393
ninan-nn wants to merge 1 commit intoalibaba:mainfrom
ninan-nn:feature/client_pool_impl

Conversation

@ninan-nn
Copy link
Collaborator

@ninan-nn ninan-nn commented Mar 9, 2026

Summary

  • Implement Kotlin SDK SandboxPool with client-side idle buffer management and configurable acquire policy (FAIL_FAST / DIRECT_CREATE).
  • Add pool domain model and extensibility points: PoolConfig, PoolCreationSpec, PoolSnapshot, PoolState, PoolStateStore, and InMemoryPoolStateStore.
  • Add reconcile loop with leader-gated replenish behavior for distributed deployments, including:
    • primary lock acquire/renew flow
    • stale idle fallback handling
    • degraded state/backoff tracking
    • lock-loss guard during putIdle
    • best-effort orphan cleanup callback on lock-loss window
  • Improve lifecycle/concurrency safety in SandboxPool:
    • periodic reconcile tick exception guard (prevents scheduler from silently stopping)
    • resize vs shutdown race handling (RejectedExecutionException safe path)
    • graceful shutdown waits for local in-flight operations (bounded by drainTimeout)
  • Remove @Deprecated annotation from SandboxPool to align with README guidance while keeping feature marked experimental in docs.
  • Update Kotlin SDK README (EN/ZH) with pool usage and distributed-store expectations.
  • Add/extend tests:
    • Kotlin unit tests for reconciler/store/pool behavior (including distributed-lock edge cases)
    • Java pool E2E coverage:
      • existing single-node pool suite
      • new pseudo-distributed suite with shared in-process distributed-like store

Testing

  • Not run (explain why)
  • Unit tests
  • Integration tests
  • e2e / manual verification

Executed:

  • cd sdks/sandbox/kotlin && ./gradlew spotlessApply build
  • cd tests/java && ./gradlew test --tests "*SandboxPool*E2ETest"

Result summary:

  • SandboxPoolSingleNodeE2ETest: 15/15 passed
  • SandboxPoolPseudoDistributedE2ETest: 7/7 passed

Breaking Changes

  • None
  • Yes (describe impact and migration path)

Checklist

  • Linked Issue or clearly described motivation
  • Added/updated docs (if needed)
  • Added/updated tests (if needed)
  • Security impact considered
  • Backward compatibility considered

@ninan-nn ninan-nn force-pushed the feature/client_pool_impl branch 11 times, most recently from 543d870 to 991e526 Compare March 13, 2026 03:50
@ninan-nn ninan-nn changed the title client pool impl feat(sdks): client sandbox pool implementation Mar 13, 2026
@ninan-nn ninan-nn force-pushed the feature/client_pool_impl branch from 991e526 to 9ecdc1f Compare March 13, 2026 04:06
@ninan-nn ninan-nn marked this pull request as ready for review March 13, 2026 04:13
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ecdc1f260

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

throw PoolNotRunningException("Cannot acquire when pool state is $state")
}
val poolName = config.poolName
val sandboxId = stateStore.tryTakeIdle(poolName)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Retry next idle sandbox before raising POOL_EMPTY

acquire() only calls tryTakeIdle once, so if that first idle ID is stale/unreachable it immediately falls through to FAIL_FAST/direct-create instead of trying other idle candidates still in the pool. In a queue like [stale, healthy], FAIL_FAST incorrectly throws POOL_EMPTY and DIRECT_CREATE needlessly creates a new sandbox, which breaks expected pool behavior and increases cost/latency.

Useful? React with 👍 / 👎.

)
return
}
stateStore.putIdle(poolName, sandboxId)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle putIdle failures to avoid orphaning new sandboxes

The reconcile loop does not guard stateStore.putIdle, so a transient state-store exception after successful creation exits the tick with already-created sandbox IDs that are neither stored as idle nor cleaned up via onOrphanedCreated. This leaks running sandboxes and also skips failure accounting for that path, so degraded state can be under-reported during store outages.

Useful? React with 👍 / 👎.

* Configuration for a client-side sandbox pool.
*
* @property poolName User-defined name and namespace for this logical pool (required).
* @property ownerId Unique process identity for primary lock ownership. If not provided, a UUID-based default is generated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it called ownerId? What about using poolId instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants