Skip to content

Add pre-match queue guard for internal matching engine (T2)#1424

Closed
sehyunc wants to merge 1 commit intocodex/implement-internal-match-order-validity-guardfrom
codex/ring0-pre-match-queue-guard
Closed

Add pre-match queue guard for internal matching engine (T2)#1424
sehyunc wants to merge 1 commit intocodex/implement-internal-match-order-validity-guardfrom
codex/ring0-pre-match-queue-guard

Conversation

@sehyunc
Copy link
Copy Markdown
Contributor

@sehyunc sehyunc commented Mar 31, 2026

Problem

After a successful internal match settlement, RunMatchingEngineHook fires two matching engine jobs — one for each party's residual order. Both jobs race to find the same counterparty and attempt settlement. The loser hits "serial preemption not allowed" because the winner already preempted the queue:

ME Job 1 (order A) → finds B → preempt_serial(A, B) → ✓
ME Job 2 (order B) → finds A → preempt_serial(B, A) → ✗ "serial preemption not allowed"

PR #1420 added a post-failure order_still_valid guard that catches this and stops retrying, but the matching engine still does full candidate search, match computation, and a failed settlement attempt before bailing out.

Options

Option 1: Guard at enqueue time (match v1 exactly)
Add a queue-empty check inside RunMatchingEngineHook::run() before sending the job to the channel. Mirrors v1's should_run_matching_engine() which gates inside the Raft transaction that pops the settlement task.

  • Pro: prevents the job from entering the channel at all
  • Con: hooks don't currently have state access; would require threading State through the hook trait, changing the TaskHook interface

Option 2: Guard at execution time (pre-match check)
Add a queue-length check at the top of run_internal_matching_engine() — bail immediately if the account has tasks in flight. Also check the counterparty's queue in try_settle_match() before attempting settlement.

Option 3: Rely solely on PR #1420 (no additional guard)
The post-failure order_still_valid check already catches the busy-queue case.

  • Pro: zero new code
  • Con: every rematch does candidate search → match → failed settlement → validity check before bailing. Noisy error logs ("serial preemption not allowed") on every successful partial fill.

Rationale

Chose Option 2. It eliminates the wasted work and error-level log noise with a single queue-length read at the top of the matching engine — the same check order_still_valid already performs, just earlier. The job entering the channel is negligible cost.

This matches the semantics of v1's should_run_matching_engine() (state/src/applicator/task_queue.rs:281) without requiring v1's mechanism (Raft-atomic gating at enqueue time). In v1, the matching engine could never run unless the wallet's serial queue was confirmed empty inside the same Raft write transaction that popped the settlement task. V2's hook architecture fires unconditionally, so we add the equivalent guard at execution time instead.

Changes

In internal_engine.rs:

  1. Pre-match guard — top of run_internal_matching_engine(): skip if own account's serial_tasks_queue_len > 0
  2. Pre-settlement guard — in try_settle_match(): skip if counterparty's serial_tasks_queue_len > 0
  3. Test for the pre-match early-exit path

Testing

@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@sehyunc sehyunc force-pushed the codex/ring0-pre-match-queue-guard branch from 8bae126 to 6506171 Compare March 31, 2026 22:21
@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@sehyunc sehyunc force-pushed the codex/ring0-pre-match-queue-guard branch from 6506171 to 4336acd Compare March 31, 2026 22:34
@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@sehyunc sehyunc force-pushed the codex/ring0-pre-match-queue-guard branch from 4336acd to 4b102ad Compare March 31, 2026 22:35
@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@sehyunc
Copy link
Copy Markdown
Contributor Author

sehyunc commented Apr 1, 2026

Closing: Investigation determined that the pre-match queue guards are redundant.

The v2 state layer already handles settlement races correctly through atomic multi-queue locking in preempt_queue_with_serial. When two ME jobs race to settle the same order pair, one wins and the other receives a "serial preemption not allowed" rejection — which is handled gracefully.

The guards were also causing a bug where RefreshAccount tasks (which don't conflict with settlement) blocked matching entirely.

See investigation in renegade-map for details.

@sehyunc sehyunc closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant