Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions redis/requeue.lua
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ local queue_key = KEYS[3]
local zset_key = KEYS[4]
local worker_queue_key = KEYS[5]
local owners_key = KEYS[6]
local error_reports_key = KEYS[7]

local max_requeues = tonumber(ARGV[1])
local global_max_requeues = tonumber(ARGV[2])
Expand Down Expand Up @@ -31,6 +32,8 @@ end
redis.call('hincrby', requeues_count_key, '___total___', 1)
redis.call('hincrby', requeues_count_key, test, 1)

redis.call('hdel', error_reports_key, test)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're doing hdel here, the transaction.hdel(key('error-reports'), id) becomes redundant.

But also since they were in a transaction, I don't see what this changes.

Copy link
Contributor Author

@ChrisBr ChrisBr Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is requeue which is not in a transaction. It's called from the queue.

  1. We call requeue from queue
  2. We go through the reporters
  3. BuildStatusRecorder will call record_success which will remove the error report

This opens a window for a race because as soon as it's requeued a new worker can run it and report a new error report which could be deleted again by the worker which did the requeue.

It's also not redundant because we now call record_requeue (instead of record_success) which will only report the status. We still need to do the delete in record_success which is in a transaction with acknowledge already.

if failed && CI::Queue.requeueable?(result) && queue.requeue(example)
result.requeue!
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also not redundant

In record_success it very much seem redundant:

            @queue.acknowledge(id, pipeline: transaction)
            transaction.hdel(key('error-reports'), id)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call requeue we don't call record_success anymore but record_requeue which is only setting the stats.

        def record_requeue(id, stats: nil)
          redis.pipelined do |pipeline|
           record_stats(stats, pipeline: pipeline)
          end
        end

Copy link
Contributor Author

@ChrisBr ChrisBr Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we were calling record_success on requeues and were

  1. deleting the error
  2. setting the stats

As we now delete the error report from inside the requeue lua script I've created a dedicated method.

The alternative would be to alter record_success to something like this

        def record_success(id, stats: nil, skip_flaky_record: false, acknowledge: true)
          _, error_reports_deleted_count, requeued_count, _ = redis.multi do |transaction|
            if acknowledge
              @queue.acknowledge(id, pipeline: transaction)
              transaction.hdel(key('error-reports'), id)
            end
            transaction.hget(key('requeues-count'), id)
            record_stats(stats, pipeline: transaction)
          end
          record_flaky(id) if !skip_flaky_record && (error_reports_deleted_count.to_i > 0 || requeued_count.to_i > 0)
          nil
        end

But it's difficult because the return values would be incorrect in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. error_reports_deleted_count is now already always 0 because it's deleted inside acknowledge.

Copy link
Contributor Author

@ChrisBr ChrisBr Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's deleted inside acknowledge.

Not acknowledge - we delete it in requeue. I think you might be mixing up acknowledge and requeue scripts?

We don't call acknowledge on a requeue, we only call requeue. Hence, we either do

  • success: call acknowledge and hdel the error-reports
  • requeue: call requeue (and don't call hdel in that case because it's inside requeue already)

Well. error_reports_deleted_count is now already always 0 because it's deleted inside acknowledge.

Right, but the assignments would be messed up, no?

Anyway, error_reports_deleted_count and requeued_count are really only relevant on success to identify a flaky test. It doesn't matter on a requeue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think you might be mixing up acknowledge and requeue scripts?

Ah indeed...


local pivot = redis.call('lrange', queue_key, -1 - offset, 0 - offset)[1]
if pivot then
redis.call('linsert', queue_key, 'BEFORE', pivot, test)
Expand Down
10 changes: 8 additions & 2 deletions ruby/lib/ci/queue/redis/build_record.rb
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,9 @@ def record_error(id, payload, stats: nil)
nil
end

def record_success(id, stats: nil, skip_flaky_record: false, acknowledge: true)
def record_success(id, stats: nil, skip_flaky_record: false)
_, error_reports_deleted_count, requeued_count, _ = redis.multi do |transaction|
@queue.acknowledge(id, pipeline: transaction) if acknowledge
@queue.acknowledge(id, pipeline: transaction)
transaction.hdel(key('error-reports'), id)
transaction.hget(key('requeues-count'), id)
record_stats(stats, pipeline: transaction)
Expand All @@ -76,6 +76,12 @@ def record_success(id, stats: nil, skip_flaky_record: false, acknowledge: true)
nil
end

def record_requeue(id, stats: nil)
redis.pipelined do |pipeline|
record_stats(stats, pipeline: pipeline)
end
end

def record_flaky(id, stats: nil)
redis.pipelined do |pipeline|
pipeline.sadd?(
Expand Down
1 change: 1 addition & 0 deletions ruby/lib/ci/queue/redis/worker.rb
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ def requeue(test, offset: Redis.requeue_offset)
key('running'),
key('worker', worker_id, 'queue'),
key('owners'),
key('error-reports'),
],
argv: [config.max_requeues, global_max_requeues, test_key, offset],
) == 1
Expand Down
4 changes: 3 additions & 1 deletion ruby/lib/minitest/queue/build_status_recorder.rb
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,10 @@ def record(test)
stats = COUNTERS.zip(COUNTERS.map { |c| send(c) }).to_h
if (test.failure || test.error?) && !test.skipped?
build.record_error("#{test.klass}##{test.name}", dump(test), stats: stats)
elsif test.requeued?
build.record_requeue("#{test.klass}##{test.name}", stats: stats)
else
build.record_success("#{test.klass}##{test.name}", stats: stats, skip_flaky_record: test.skipped?, acknowledge: !test.requeued?)
build.record_success("#{test.klass}##{test.name}", stats: stats, skip_flaky_record: test.skipped?)
end
end

Expand Down
3 changes: 2 additions & 1 deletion ruby/test/integration/minitest_redis_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -509,14 +509,15 @@ def test_retry_report
assert_equal 100, error_reports.size

error_reports.keys.each_with_index do |test_id, index|
queue.instance_variable_set(:@reserved_tests, Set.new([test_id]))
queue.build.record_success(test_id.dup, stats: {
'assertions' => index + 1,
'errors' => 0,
'failures' => 0,
'skips' => 0,
'requeues' => 0,
'total_time' => index + 1,
}, acknowledge: false)
})
end

# Retry first worker, bailing out
Expand Down
Loading