LLMSafetyBenchmark介绍

本项目旨在全面评估大型语言模型(LLM)的安全能力，通过多维度测试框架检测模型在安全相关场景下的表现。评估体系包含选择题、主观题、CTF实践题等多种题型，覆盖不同难度层次的安全问题。本项目获得第三届中国研究生网络安全创新大赛揭榜挑战赛赛题全国二等奖

使用方法

测试脚本

# 参数说明：
# --model       指定模型名称
# --fewshot     选择 few-shot 示例数量（可选 2 或 5）
# --apikey      硅基流动的 API 密钥
# --datasets    指定测试数据集路径

python ./llms_eval_api.py \
  --model qwen25coder_7b \
  --fewshot 5 \
  --datasets /home/A_master/LLMsEval/codes/datas/all/obj_6352_aigen.json \
  --apikey sk-rcewjdxsgjvupcyedtqzttgjqxsjpvhtpeavunjccuvtdesn

测试题目的格式说明

{
  "id": 1,
  "source": "owasp_mastg",
  "question": "An Android application developer is tasked with ensuring the security of sensitive user information. The developer has several options for data storage, each with different security implications. Which of the following storage options should the developer avoid to prevent exposing sensitive user data to other applications on the device?",
  "choices": [
    "A: Using SharedPreferences with MODE_PRIVATE to store user preferences and settings.",
    "B: Storing user credentials in an unencrypted SQLite database accessible to the app.",
    "C: Implementing SQLCipher to encrypt SQLite databases containing sensitive user information.",
    "D: Saving encrypted user data in the Android Keystore system."
  ],
  "answer": "B",
  "topics": [
    "ApplicationSecurity"
  ],
  "keyword": "SQLite",
  "tag": "ApplicationSecurity",
  "mission_class": "multi"
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docker		docker
few_shot		few_shot
metrics		metrics
models		models
ollama		ollama
scripts		scripts
test_data		test_data
README.md		README.md
llm_eval_groq.py		llm_eval_groq.py
llms_eval.py		llms_eval.py
llms_eval_api.py		llms_eval_api.py
llms_eval_gpt.py		llms_eval_gpt.py
llms_eval_openrouter.py		llms_eval_openrouter.py
models.txt		models.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMSafetyBenchmark介绍

使用方法

测试脚本

测试题目的格式说明

数据集分布：

About

Uh oh!

Releases

Packages

Languages

Disviel/LLMSafetyBenchmark

Folders and files

Latest commit

History

Repository files navigation

LLMSafetyBenchmark介绍

使用方法

测试脚本

测试题目的格式说明

数据集分布：

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages