SKA-Bench

An implementation for SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs

Environment

conda create -n skabench python=3.9.0
conda activate skabench
pip install openai
pip install asyncio
pip install uvloop

Testbed Construction

For noisy robustness, order insensitivity and information integration testbeds, you can run:

python process_dataset.py --type KG --sequence random --scale 1k

NOTE:

Please write the data type in type, sequence type in sequence, the size of scale in size before running the code. Then the test set will be generated in the dataset folder.

For negative rejection, you can run:

python process_dataset.py --type Table --sequence original --scale 4k --negative_rejection negative_rejection
python process_dataset.py --type KG --sequence random --scale 4k --negative_rejection negative_rejection
python process_dataset.py --type Table+Text --sequence original --scale 12k --negative_rejection negative_rejection
python process_dataset.py --type KG+Text --sequence random --scale 12k --negative_rejection negative_rejection

Evaluating scripts

For noisy robustness, order insensitivity and information integration testbeds, you can run:

python evaluate.py --type <type> --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table_original_42_4k.json

NOTE:

Please change the data type in <type>, the api key in <api_key>, the api url in <api_url>, the model type in <model>, and dataset dir in the position of ./dataset/Table_original_42_4k.json.

For negative rejection, you can run:

python evaluate_negative.py --type KG --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/KG_random_42_4k_negative_rejection.json
python evaluate_negative.py --type Table --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table_original_42_4k_negative_rejection.json
python evaluate_negative.py --type KG+Text --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/KG+Text_random_42_12k_negative_rejection.json
python evaluate_negative.py --type Table+Text --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table+Text_original_42_12k_negative_rejection.json

🤝 Cite:

Please consider citing this paper if you find our work useful.


@article{liu2025ska,
  title={SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs},
  author={Liu, Zhiqiang and Niu, Enpei and Hua, Yin and Sun, Mengshu and Liang, Lei and Chen, Huajun and Zhang, Wen},
  journal={arXiv preprint arXiv:2507.17178},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
KG+Text.json		KG+Text.json
KG.json		KG.json
README.md		README.md
Table+Text.json		Table+Text.json
Table.json		Table.json
evaluate.py		evaluate.py
evaluate_negative.py		evaluate_negative.py
process_dataset.py		process_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SKA-Bench

Environment

Testbed Construction

Evaluating scripts

🤝 Cite:

About

Uh oh!

Releases

Packages

Languages

zjukg/SKA-Bench

Folders and files

Latest commit

History

Repository files navigation

SKA-Bench

Environment

Testbed Construction

Evaluating scripts

🤝 Cite:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages