This repo is forked from ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models. We only used and modified this code for testing the results with LLaMA3-8B (details shown later) on MATH dataset. For hotpot dataset, we don't have enough computational resources and time to finish the whole test, but the code is runnable.
Prepare a Linux environment (for Windows, Ubuntu for WSL can work). The reason for using Linux is that the shell is written in Unix style that is not supproted on powershell. But since it is very easy to be modified, Windows is also OK.
We test this code on Python 3.9 (At first we tried Python 3.12 but it didn't work).
conda create -n [your environment name] python=3.9
conda activete [your environment name]
pip install -r requirement.txtWe don't want to modify the code too much so we use Ollama. Please visit Ollama website to download and install Ollama for you.
Once you have installed Ollama, for Windows user:
ollama serve
ollama pull llama3
ollama run llama3Note that if you are using WSL and have deployed ollama on your Windows, You need to modify the ollama listening ip from localhost to 0.0.0.0 .
Now we only need to change a little bit for LLaMA.
For MATH dataset:
Go to ./math/solve_turbo_chatcot.py line 653 and change this to your ollama's ip and port.
openai.api_base = "http://172.23.112.1:11434/v1"To test results on a certain subset of MATH:
cd math
bash ./scripts/run_turbo_chatcot.shNow to find out which subset is being tested, go to ./scripts/run_turbo_chatcot.sh and change these two lines
DATA_SPLIT=intermediate_algebra # See "knowledge_point" in ./dataset/math/test_retrieval-all.json for all subsets.
RESULT_FOLDER=result/math_ia # change this too