Implement K-Learning with Surprise in DreamerV3 Architecture

Replace vanilla actor critic with epistemic risk seeking actor critic (ERSAC) in the DreamerV3 architecture to determine the benefits of modulating risk with both reward variance and Bayesian surprise. ERSAC paper: https://arxiv.org/pdf/2302.09339; Bayesian surprise paper: https://arxiv.org/abs/2310.08731