-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Found in v0.2.0 during containerized release testing. (Test revision: 9f6d609 - chore(TestNetRewarder): update emoji on SlotFilled event (#10) - Test: "SlotRepairTest.SingleFailure" (double-failure also affected.))
Setup:
The network has 6 hosts.
There is 1 storage request.
Each host has enough tokens to place collateral on 1 slot of the storage request.
Test:
A new host is started.
The oldest host that has previously filled a slot, is stopped.
Assert that the correct slot-freed event occurs.
Assert that a new slot-filled event occurs.
These steps are repeated up to 10 times.
Eventually, all original hosts have been replaced.
We assert that the storage request does not fail, and all slots are always repaired.
Failure:
After several loops, the new slot-filled event does not occur.
This happens only sometimes. More loops makes this failure more likely to be detected.
Cause:
When the slot-freed event occurs, all hosts rush to perform a reserve-slot transaction.
Only 3 (default MaxSlotReservations config value) hosts will succeed in reserving the slot.
Those 3 will proceed to download and repair the data.
Then they will generate initial storage proof.
Then they will attempt to perform the fill-slot contract function, which requires the slot collateral be locked in.
If the 3 hosts who won the reserve-race are already currently hosting a slot, then they will not have enough collateral available, and their fill-slot call will fail.
In this case, the other hosts who didn't reserve the slot, have already discarded it.
The slot remains free. None of the hosts who reserved it, can fill it.
Fix:
Before performing the reserve-slot call in the sale-preparing state, make sure enough tokens are available for the required collateral. This will prevent this issue from occuring AND it will save host-operators the transaction costs of reserve-slot calls for slots that they wouldn't be able to fill anyway.
To consider: What if multiple slots are being reserved/filled concurrently? Can the balance-check be done such that it handles this situation?