Thank you very much for sharing the whole implementation! I am curious about dropout step in this figure, may I ask some questions?

- As shown in the code here, I guess that dropout on indicator is applied during the warmup steps (6255 steps), the keep probability (rather than drop probability) is 0.6; after 6255 steps, the keep probability is 100, and the model will choose 3x3, 5x5, exp=3, exp=6 by learned threshold. May I ask whether my interpretation correct?
- May I ask why the runtime term is larger in the dropout steps? And why the runtime decrease rapidly and precisely to 79 ms? What whould the curve be like if hyperparameter lambda=0.02 is setted to other values?
Thank you very much for sharing the whole implementation! I am curious about

dropout stepin this figure, may I ask some questions?