Skip to content

Conversation

@anmarchenko
Copy link
Member

@anmarchenko anmarchenko commented Jan 30, 2026

Summary

  • Add --ci-node-workers setting (default: CPU count) to control how many parallel workers run within each CI node
  • When running in CI-node mode with multiple workers, tests assigned to the node are split among local workers and executed in parallel via errgroup
  • {{nodeIndex}} in worker-env is now a global worker index calculated as ciNode * ciNodeWorkers + localWorkerIndex, ensuring unique indices across all workers on all nodes
  • Fix edge case in parallelism calculation when maxParallelism is 0 or negative
  • Fix bundle info parsing in SanityCheck to use regex matching, making it more reliable when debug logs are present in output

Test plan

  • Added tests for splitTestFilesIntoGroups() helper (even/uneven splits, edge cases)
  • Added tests verifying global index calculation (e.g., ci-node=1 with 2 workers yields indices 2 and 3)
  • Added tests for single-worker and multi-worker CI-node modes
  • Added tests for ci_node_workers setting defaults and env/flag overrides
  • Added tests for maxParallelism <= 0 edge case
  • Added test for bundle info output with debug logs
  • make test and make lint pass

E2E verification steps

  • run forem test suite with ddtest, pass DD_TEST_OPTIMIZATION_RUNNER_CI_NODE=1 to execute in CI node mode (only one part of the split) and DD_TEST_OPTIMIZATION_RUNNER_CI_NODE_WORKERS=4 to subsplit inside the node into 4 workers
  • make sure that tests run successfully - no test failures, exit code 0
  • from logs inspect worker-env for workers - do we have unique variables for every worker?
  • inspect logs to find evidence of a tests split - do we execute each test file only once?
  • inspect logs for any warnings or errors
  • inspect the test sessions we submitted to Datadog: there must be 4 of them

@anmarchenko anmarchenko requested a review from a team as a code owner January 30, 2026 13:01
@anmarchenko anmarchenko changed the title Add local parallelism for CI nodes [SDTEST-2702] Add local parallelism for CI nodes Jan 30, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 477c53b619

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@anmarchenko
Copy link
Member Author

E2E Test Report: SUCCESS ✅

Tested by: Shepherd Agent (autonomous QA for Datadog Test Optimization)

Test Environment

  • Method: Local testing with rubocop playground (RSpec, 663 test files)
  • Revision: 30e67ba
  • Branch: anmarchenko/ci_node_parallelism

Configuration

DD_TEST_OPTIMIZATION_RUNNER_CI_NODE=1
DD_TEST_OPTIMIZATION_RUNNER_CI_NODE_WORKERS=4
DD_TEST_OPTIMIZATION_RUNNER_WORKER_ENV=MY_WORKER_ID=worker_{{nodeIndex}};TEST_DB=db_{{nodeIndex}}

Results

Verification Point Expected Actual Status
CI node parallel mode activated Yes ciNode=1 ciNodeWorkers=4
Test files split across workers 83 files for node 1 21 + 21 + 21 + 20 = 83
Unique global worker indices 10000-10003 10000, 10001, 10002, 10003
Worker-env {{nodeIndex}} substitution Unique per worker Each worker received unique values
4 test sessions submitted 4 sessions 4 unique session IDs
Tests run successfully Exit code 0 Exit code 0
No ddtest-related warnings/errors None None

Worker-Env Verification

Global Index MY_WORKER_ID TEST_DB Test Files
10000 worker_10000 db_10000 21
10001 worker_10001 db_10001 21
10002 worker_10002 db_10002 21
10003 worker_10003 db_10003 20

Test Methodology

  1. Built ddtest from anmarchenko/ci_node_parallelism branch
  2. Ran rubocop test suite with CI node mode (CI_NODE=1, CI_NODE_WORKERS=4)
  3. Verified global index calculation: ciNode * 10000 + localWorkerIndex
  4. Verified {{nodeIndex}} placeholder substitution in worker-env
  5. Confirmed 4 unique test sessions were created
  6. All 2078 tests passed (540 + 522 + 549 + 467)

Key Observations

  • Global worker index formula works correctly for CI node 1: indices 10000, 10001, 10002, 10003
  • Test files are evenly distributed across workers (21/21/21/20)
  • Worker environment variables correctly substituted with unique global indices
  • Each worker creates its own independent test session

This E2E test was performed by Shepherd - autonomous QA agent for Datadog Test Optimization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants