- An implementation for SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs
conda create -n skabench python=3.9.0
conda activate skabench
pip install openai
pip install asyncio
pip install uvloopFor noisy robustness, order insensitivity and information integration testbeds, you can run:
python process_dataset.py --type KG --sequence random --scale 1kNOTE:
Please write the data type in type, sequence type in sequence, the size of scale in size before running the code. Then the test set will be generated in the dataset folder.
For negative rejection, you can run:
python process_dataset.py --type Table --sequence original --scale 4k --negative_rejection negative_rejection
python process_dataset.py --type KG --sequence random --scale 4k --negative_rejection negative_rejection
python process_dataset.py --type Table+Text --sequence original --scale 12k --negative_rejection negative_rejection
python process_dataset.py --type KG+Text --sequence random --scale 12k --negative_rejection negative_rejectionFor noisy robustness, order insensitivity and information integration testbeds, you can run:
python evaluate.py --type <type> --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table_original_42_4k.jsonNOTE:
Please change the data type in <type>, the api key in <api_key>, the api url in <api_url>, the model type in <model>, and dataset dir in the position of ./dataset/Table_original_42_4k.json.
For negative rejection, you can run:
python evaluate_negative.py --type KG --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/KG_random_42_4k_negative_rejection.json
python evaluate_negative.py --type Table --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table_original_42_4k_negative_rejection.json
python evaluate_negative.py --type KG+Text --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/KG+Text_random_42_12k_negative_rejection.json
python evaluate_negative.py --type Table+Text --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table+Text_original_42_12k_negative_rejection.jsonPlease consider citing this paper if you find our work useful.
@article{liu2025ska,
title={SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs},
author={Liu, Zhiqiang and Niu, Enpei and Hua, Yin and Sun, Mengshu and Liang, Lei and Chen, Huajun and Zhang, Wen},
journal={arXiv preprint arXiv:2507.17178},
year={2025}
}