Why do I get exe_acc accuracy, prog_acc accuracy, and backup_acc accuracy all 0 when I run run_AR-LSAT.sh with the GPT-4o model?