Generate with teacher model, saving: - Token string - Probability over each token - Early exit probs for each layer on each token