1. Plotting the distribution on early exits as different KL scaling factor. 2. Investigate coherence of final answers for different scaling factors. [not urgent, for final paper]