This repo includes the system developed for SemEval 2025 Task 8: Question Answering Over Tabular Data by AILS-NTUA.
The system ranked 1st in the proprietary models ranking in both subtasks of the competition.
The system description paper will be published in ACL 2025. The preprint is available at arXiv.
The system performs Text-to-Python Code conversion of user queries through prompting Large Language Models (LLMs). More details on the architecture can be found in the paper.
Note
Tested on Python 3.12
- Clone the repository
- Install the required packages from
requirements.txt:
pip install -r requirements.txt-
Set up credentials or models based on the evaluation scenario:
-
For evaluating Claude 3.5 Sonnet or Llama 3.1 Instruct-405B: Create a
.envfile in the root directory and add AWS credentials:AWS_ACCESS_KEY_ID=your_access_key_id AWS_SECRET_ACCESS_KEY=your_secret_access_key
-
For evaluating Ollama models (
llama3.1:8b,llama3.3:70b,qwen2.5-coder:7b): Download the models by following the instructions on the Ollama website. Ensure that Ollama is installed and running on port 11434.
-
-
Download
competition.zipand extract it in the root directory for running the model in the DataBench Test Set (this is the default behavior, can be changed by loading another split as shown in the Hugging Face Page). This can be downloaded from the DataBench Competition Page or directly from here.
unzip competition.zip- Download the
answers.zipfile with the answers for the test set and extract it incompetition/answers/directory.
wget https://raw.githubusercontent.com/jorses/databench_eval/main/examples/answers.zip
mkdir -p competition/answers
unzip answers.zip -d competition/answers- Run the
main.pyscript with input the specification of the pipeline. All pipelines are found in theconfig/folder. Include the--liteflag to run on DataBench lite.
python main.py --pipeline config/claude3.5-sonnet
# or
python main.py --pipeline config/claude3.5-sonnet --lite- The results will be saved in a new
results/directory.