As inspecting through your codes, I found there is a function cal_return_to_go which requires a config dictionary for the high/low reward values for each env.
What is its purpose and what if in real-world problems we cannot ensure the high/low rewards of the environment?