LLM Testing Hub

This repository provides a comprehensive suite for evaluating, red teaming, and assuring Large Language Models (LLMs) using a variety of open-source tools and custom harnesses.

Features

Promptfoo: Prompt evaluation and assertion framework
DeepEval: Automated LLM evaluation harness
LangTest: Language model testing and benchmarking
Red Teaming: Configurations and scripts for adversarial testing
Assurance Harnesses: For AI safety, compliance, and robustness

Structure

llm-testing-tools-eval/ — Main evaluation harnesses, configs, and scripts
llm_eval_agent/ — Custom LLM evaluation framework (API, dashboard, harness)
Session 1-6/ — Example sessions, challenge prompts, and test results
documents/ — Tool-specific documentation and usage guides
requirements.txt — Python dependencies for evaluation harnesses
.env.example — Example environment variables for API keys

Quick Start

Clone the repository
Copy .env.example to .env and add your API keys
Install dependencies: pip install -r requirements.txt
Run evaluation scripts or harnesses as needed

Security

Never commit secrets: Use .env files for API keys and sensitive info
Red teaming: Includes adversarial prompt configs and reporting

License

MIT License

Maintainers

K11 Software Solutions

For more details, see the documentation in the documents/ folder.

llm_eval_agent Framework

The llm_eval_agent is a custom Python framework for orchestrating, automating, and visualizing LLM evaluation workflows. It provides:

API Server (FastAPI): Run, track, and manage LLM test jobs via REST endpoints.
Streamlit Dashboard: Upload data, launch tests, monitor status, and visualize results in a user-friendly UI.
Flexible Test Harness: Supports multiple evaluation tools (LangTest, Promptfoo, DeepEval) and custom agents.
Live Status Tracking: See all test runs, their status, and download or visualize results instantly.
Visualization: Generate bar charts and summary plots from test results (JSON/HTML).
Documentation: See llm_eval_agent/README_llm_eval_agent.md for setup, API usage, and dashboard instructions.

This framework enables robust, reproducible, and extensible LLM evaluation pipelines for research and production.

llm_eval_agent Usage Guide

📬 Contact

For consulting, training, or implementation support:
🔗 softwaretestautomation.org
🔗 k11softwaresolutions.com
📧 k11softwaresolutions@outlook.com

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.deepeval		.deepeval
Session 1		Session 1
Session 2		Session 2
Session 3		Session 3
Session 4		Session 4
Session 5		Session 5
Session 6		Session 6
documents		documents
llm-testing-tools-eval		llm-testing-tools-eval
llm_eval_agent		llm_eval_agent
session1_promptfoo_testresults		session1_promptfoo_testresults
session2_promptfoo_testresults		session2_promptfoo_testresults
session3_promptfoo_testresults		session3_promptfoo_testresults
session4_promptfoo_testresults		session4_promptfoo_testresults
session5_deepeval_testresults		session5_deepeval_testresults
session6_langtest_testresults		session6_langtest_testresults
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
session1.sh		session1.sh
session2.sh		session2.sh
session3.sh		session3.sh
session4.sh		session4.sh
session5.sh		session5.sh
session5_deeveval_features_tests.sh		session5_deeveval_features_tests.sh
session6.sh		session6.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Testing Hub

Features

Structure

Quick Start

Security

License

Maintainers

llm_eval_agent Framework

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Testing Hub

Features

Structure

Quick Start

Security

License

Maintainers

llm_eval_agent Framework

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages