test: concurrency and race condition stress tests (#705)#825
test: concurrency and race condition stress tests (#705)#825Chris0Jeky wants to merge 9 commits intomainfrom
Conversation
Add QueueClaimRaceTests with 4 scenarios: - 10 parallel workers claiming same LLM queue item (SQLite serialization documented) - Capture triage with stale timestamp (no duplicate proposals) - Batch concurrent workers (no item processed twice) - Two workers processing different items simultaneously Part of #705 (TST-55).
Add CardUpdateConflictTests with 5 scenarios: - Concurrent card moves to different columns - Stale-write detection via ExpectedUpdatedAt (409 Conflict) - Last-writer-wins without stale check - Column reorder race (consistent final state) - Concurrent card creation in same column (no duplicates) Part of #705 (TST-55).
Add ProposalApprovalRaceTests with 4 scenarios: - Double-approve prevention (at least one succeeds, loser gets 409) - Approve + Expire race (simulated via concurrent approve + reject) - Approve + Reject race (one wins cleanly) - Double-execute prevention (no duplicate side effects) Part of #705 (TST-55).
Add WebhookDeliveryConcurrencyTests with 2 scenarios: - Concurrent board mutations each create delivery record - Concurrent webhook subscription creation (all distinct IDs) Part of #705 (TST-55).
Add BoardPresenceConcurrencyTests with 2 scenarios: - Rapid join/leave stress (eventually consistent presence snapshot) - Disconnect during edit clears editing state Part of #705 (TST-55).
Add RateLimitingConcurrencyTests with 3 scenarios: - Burst beyond limit (correct number throttled) - Cross-user isolation (user A throttled doesn't affect user B) - Retry-After header verification on throttled requests Part of #705 (TST-55).
Add CrossUserIsolationStressTests with 2 scenarios: - Concurrent board creation by 5 users (no cross-user contamination) - Concurrent capture item creation (user-scoped data isolation) Part of #705 (TST-55).
- Replace Task.Delay(2000) with proper polling loop in batch test - Wrap client disposal in try/finally to prevent resource leaks
Adversarial Self-ReviewIssues Found and Fixed
Remaining Observations (Not Bugs)
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive suite of concurrency and stress tests covering SignalR board presence, card update conflicts, cross-user data isolation, proposal workflows, queue claiming, rate limiting, and webhook delivery. The review feedback focuses on improving test reliability and maintainability by replacing manual delays and polling loops with the existing PollUntilAsync helper, and strengthening assertions in rate-limiting scenarios to ensure expected behaviors are explicitly verified.
| // Use PollUntilAsync-style polling instead of Task.Delay to avoid flakiness. | ||
| var deadline = DateTimeOffset.UtcNow + TimeSpan.FromSeconds(15); | ||
| List<ProposalDto>? proposals = null; | ||
| while (DateTimeOffset.UtcNow < deadline) | ||
| { | ||
| var proposalsResp = await client.GetAsync($"/api/automation/proposals?boardId={board.Id}"); | ||
| proposalsResp.StatusCode.Should().Be(HttpStatusCode.OK); | ||
| proposals = await proposalsResp.Content.ReadFromJsonAsync<List<ProposalDto>>(); | ||
| if (proposals != null && proposals.Count >= captureIds.Count) | ||
| break; | ||
| await Task.Delay(200); | ||
| } | ||
|
|
||
| proposals.Should().NotBeNull(); |
There was a problem hiding this comment.
Using a hardcoded Task.Delay makes the test non-deterministic and potentially slow. It's better to use PollUntilAsync to wait for the expected state. Additionally, the test should verify that each item was processed exactly once (count == 1) rather than just 'at most once', to ensure no data loss occurred during the race.
var proposals = await ApiTestHarness.PollUntilAsync(
async () =>
{
var resp = await client.GetAsync($"/api/automation/proposals?boardId={board.Id}");
return await resp.Content.ReadFromJsonAsync<List<ProposalDto>>();
},
proposals => proposals?.Count >= batchSize,
"batch proposal creation");
// Each capture item should have exactly one proposal (no duplicates, no losses)
foreach (var captureId in captureIds)
{
var matching = proposals!.Count(p => p.SourceReferenceId == captureId.ToString());
matching.Should().Be(1,
$"capture item {captureId} should have exactly one proposal");
}| var deadline = DateTimeOffset.UtcNow + TimeSpan.FromSeconds(10); | ||
| IReadOnlyList<OutboundWebhookDelivery> deliveries = []; | ||
| while (DateTimeOffset.UtcNow < deadline) | ||
| { | ||
| deliveries = await deliveryRepo.GetBySubscriptionAsync( | ||
| webhookSub!.Subscription.Id, limit: mutationCount + 5); | ||
| if (deliveries.Count >= mutationCount) | ||
| break; | ||
| await Task.Delay(100); | ||
| } |
There was a problem hiding this comment.
This manual polling loop can be replaced with the ApiTestHarness.PollUntilAsync helper for better readability and consistency with other tests in the suite.
var deliveries = await ApiTestHarness.PollUntilAsync(
() => deliveryRepo.GetBySubscriptionAsync(webhookSub!.Subscription.Id, limit: mutationCount + 5),
d => d.Count >= mutationCount,
"webhook delivery records");| if (second.StatusCode == (HttpStatusCode)429) | ||
| { | ||
| // Retry-After header should be present on 429 responses | ||
| second.Headers.Contains("Retry-After").Should().BeTrue( | ||
| "429 responses should include a Retry-After header"); | ||
| } |
There was a problem hiding this comment.
The test currently only checks for the Retry-After header if the status code happens to be 429. It should explicitly assert that the status code is 429 to ensure the rate limiter is actually working as expected for this scenario.
second.StatusCode.Should().Be((HttpStatusCode)429, "the second request should be throttled");
second.Headers.Contains("Retry-After").Should().BeTrue("429 responses should include a Retry-After header");There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bcf6513dd9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (second.StatusCode == (HttpStatusCode)429) | ||
| { |
There was a problem hiding this comment.
Assert throttling before validating Retry-After
This test only validates Retry-After inside a conditional branch, so it silently passes when the second login is not throttled at all (for example if rate-limiting is misconfigured). That means the scenario in the test name can regress without any failure signal. Add an explicit assertion that the second response is 429 TooManyRequests before checking headers.
Useful? React with 👍 / 👎.
| // No 500 errors | ||
| responseData.Should().NotContain(r => r.Status == HttpStatusCode.InternalServerError, | ||
| "no internal server errors during concurrent processing"); |
There was a problem hiding this comment.
Enforce distinct queue claims in two-worker race test
The test is named to verify that two workers claim different pending items, but the only assertion is that no response is 500. This allows the test to pass even if one worker gets no item or both workers process the same item, so it does not protect the invariant it documents. Assert the expected successful claim count and that claimed item identifiers are distinct.
Useful? React with 👍 / 👎.
| var otherUserBoards = boards!.Where(b => | ||
| userBoards.Any(kv => kv.Key != user.Username && kv.Value == b.Id)); | ||
| if (otherUserBoards.Any()) |
There was a problem hiding this comment.
Remove race in cross-user contamination detection
This leakage check depends on userBoards, which is being populated concurrently by other tasks. If a user can see another user's board before that board ID is inserted into userBoards, otherUserBoards.Any() remains false and the test misses the isolation breach. Perform board creation first, then validate visibility against the complete set of known board IDs to avoid this timing-dependent false negative.
Useful? React with 👍 / 👎.
…ock risk - Replace Barrier.SignalAndWait (blocking) with SemaphoreSlim (async-safe) in WebhookDeliveryConcurrencyTests and BoardPresenceConcurrencyTests to prevent thread-pool starvation deadlocks under CI - Fix cross-user isolation test race: separate board creation from verification so the check runs against the complete set of board IDs - Assert 429 status code explicitly in ThrottledRequests test instead of silently passing when rate limiting is broken - Strengthen ProcessNext_TwoWorkersTwoItems to assert at least one success and valid status codes (not just "no 500s") - Assert exactly 1 card in DoubleExecute test (was 0-1, hiding data loss) - Assert exactly 1 proposal per batch item (was <=1, hiding data loss)
Round 2 Adversarial ReviewI found 6 real issues and pushed fixes in commit eee12bf. All 39 concurrency tests pass after the fixes. Critical (1)1. Barrier.SignalAndWait blocks thread-pool threads inside async lambdas -- deadlock risk
The rest of the PR correctly uses Important (4)2. Cross-user isolation check races against concurrent inserts (false negative)
3. ThrottledRequests_IncludeRetryAfterHeader silently passes when rate limiting is broken The 4. DoubleExecute asserts 0-1 cards instead of exactly 1 (hides data loss)
5. Batch triage asserts "at most one" instead of "exactly one" per item (hides data loss)
Minor (2)6. ProcessNext_TwoWorkersTwoItems only asserts "no 500s" -- does not verify the invariant in its name The test is called 7. Significant overlap with existing ConcurrencyRaceConditionStressTests.cs (1195 lines) The self-review acknowledges this but does not address it. The existing file covers queue claim races, card update conflicts, proposal approval races, rate limiting, webhook delivery, board presence, and cross-user isolation -- essentially every scenario in the new What's Good
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eee12bfdcd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| var throttledCount = codes.Count(s => s == (HttpStatusCode)429); | ||
| throttledCount.Should().BeGreaterOrEqualTo(burstSize - permitLimit, | ||
| $"with permit limit {permitLimit} and burst {burstSize}, " + | ||
| $"at least {burstSize - permitLimit} requests should be throttled"); |
There was a problem hiding this comment.
Assert burst outcomes are limited to expected status codes
This test only checks how many 429 responses occurred and never validates the other response codes. If the auth path regresses under concurrency (for example some requests return 500), the test can still pass as long as enough requests are throttled, which masks a real failure mode. Assert that all responses are from the expected set for invalid logins under rate limiting (e.g. 401/429) so the stress test fails on unexpected server errors.
Useful? React with 👍 / 👎.
| okCount.Should().BeGreaterOrEqualTo(1, | ||
| "at least one execute should succeed"); |
There was a problem hiding this comment.
Constrain losing execute response codes in race test
The concurrent execute assertion only requires at least one 200 OK, so a run where the second request fails with 500 InternalServerError still passes if final proposal/card state happens to look correct. That weakens the test against real race-condition regressions in the execute endpoint. Add an assertion that non-OK responses are only expected conflict/idempotent outcomes and explicitly reject internal server errors.
Useful? React with 👍 / 👎.
Summary
Concurrency/subfolderTask.WhenAllwithSemaphoreSlim/Barrierfor truly simultaneous executionTest files added:
QueueClaimRaceTests.cs(4 tests) -- 10-worker claim race, stale timestamp, batch processingCardUpdateConflictTests.cs(5 tests) -- concurrent moves, stale-write 409, last-writer-wins, column reorder, concurrent card creationProposalApprovalRaceTests.cs(4 tests) -- double-approve, approve+expire, approve+reject, double-executeWebhookDeliveryConcurrencyTests.cs(2 tests) -- concurrent deliveries, concurrent subscriptionsBoardPresenceConcurrencyTests.cs(2 tests) -- rapid join/leave, disconnect during editRateLimitingConcurrencyTests.cs(3 tests) -- burst beyond limit, cross-user isolation, Retry-After headerCrossUserIsolationStressTests.cs(2 tests) -- concurrent board creation, concurrent capture isolationCloses #705
Test plan