Conversation
45819b9 to
f8d3efc
Compare
| redis.call('hincrby', requeues_count_key, '___total___', 1) | ||
| redis.call('hincrby', requeues_count_key, test, 1) | ||
|
|
||
| redis.call('hdel', error_reports_key, test) |
There was a problem hiding this comment.
If we're doing hdel here, the transaction.hdel(key('error-reports'), id) becomes redundant.
But also since they were in a transaction, I don't see what this changes.
There was a problem hiding this comment.
This is requeue which is not in a transaction. It's called from the queue.
- We call requeue from queue
- We go through the reporters
BuildStatusRecorderwill callrecord_successwhich will remove the error report
This opens a window for a race because as soon as it's requeued a new worker can run it and report a new error report which could be deleted again by the worker which did the requeue.
It's also not redundant because we now call record_requeue (instead of record_success) which will only report the status. We still need to do the delete in record_success which is in a transaction with acknowledge already.
ci-queue/ruby/lib/minitest/queue.rb
Lines 189 to 191 in d154d85
There was a problem hiding this comment.
It's also not redundant
In record_success it very much seem redundant:
@queue.acknowledge(id, pipeline: transaction)
transaction.hdel(key('error-reports'), id)There was a problem hiding this comment.
If we call requeue we don't call record_success anymore but record_requeue which is only setting the stats.
def record_requeue(id, stats: nil)
redis.pipelined do |pipeline|
record_stats(stats, pipeline: pipeline)
end
endThere was a problem hiding this comment.
Before we were calling record_success on requeues and were
- deleting the error
- setting the stats
As we now delete the error report from inside the requeue lua script I've created a dedicated method.
The alternative would be to alter record_success to something like this
def record_success(id, stats: nil, skip_flaky_record: false, acknowledge: true)
_, error_reports_deleted_count, requeued_count, _ = redis.multi do |transaction|
if acknowledge
@queue.acknowledge(id, pipeline: transaction)
transaction.hdel(key('error-reports'), id)
end
transaction.hget(key('requeues-count'), id)
record_stats(stats, pipeline: transaction)
end
record_flaky(id) if !skip_flaky_record && (error_reports_deleted_count.to_i > 0 || requeued_count.to_i > 0)
nil
endBut it's difficult because the return values would be incorrect in that case?
There was a problem hiding this comment.
Well. error_reports_deleted_count is now already always 0 because it's deleted inside acknowledge.
There was a problem hiding this comment.
it's deleted inside acknowledge.
Not acknowledge - we delete it in requeue. I think you might be mixing up acknowledge and requeue scripts?
We don't call acknowledge on a requeue, we only call requeue. Hence, we either do
- success: call
acknowledgeandhdelthe error-reports - requeue: call
requeue(and don't callhdelin that case because it's insiderequeuealready)
Well. error_reports_deleted_count is now already always 0 because it's deleted inside acknowledge.
Right, but the assignments would be messed up, no?
Anyway, error_reports_deleted_count and requeued_count are really only relevant on success to identify a flaky test. It doesn't matter on a requeue.
There was a problem hiding this comment.
think you might be mixing up
acknowledgeandrequeuescripts?
Ah indeed...
|
/shipit |
When a test gets requeued, we put it first back into the list and then later remove the error report in the reported. However, a race can occour when the test is picked up by another worker in the meantime and fails. The worker which requeud the test would then remove the error report again.