OpenAudit - AI Bias Testing Framework

A production-ready framework for detecting algorithmic bias in Large Language Models (LLMs). OpenAudit provides empirical bias testing with real LLM responses and statistical analysis for AI fairness research.

🎯 Core Features (Working)

Multi-LLM Integration: Test bias across 26+ models (GPT, Claude, Gemini, etc.)
Real-Time Bias Detection: Analyze actual LLM responses, not synthetic data
Statistical Analysis: Detect bias patterns with proper statistical methods
CV Generation: Create realistic candidate profiles with controlled variables
Research-Grade Results: Publication-quality bias analysis and reporting

🚀 Quick Start (30 seconds)

Installation:

pip install -r requirements.txt
cp .env.example .env  # Add your API keys

5-Minute Bias Test:

from core.multi_llm_dispatcher import MultiLLMDispatcher
from core.bias_testing_framework import BiasAnalyzer
import asyncio

# 1. Test identical qualifications, different names
candidates = [
    {"name": "Sarah Chen", "race": "Asian", "gender": "female"},
    {"name": "John Smith", "race": "White", "gender": "male"}
]

# 2. Get real LLM responses
dispatcher = MultiLLMDispatcher()
responses = await dispatcher.dispatch_prompt(
    "Evaluate this candidate for hire: [CV here]",
    models=["gpt-4o", "claude-3-sonnet"]
)

# 3. Detect bias
analyzer = BiasAnalyzer(responses)
results = analyzer.calculate_bias_by_demographic()
print(f"Bias detected: {results['bias_detected']}")
print(f"Hire rate range: {results['rate_range']:.2f}")

🔬 Real Research Results

OpenAudit has detected significant bias patterns in production LLMs:

Example Finding:

📊 Hiring Evaluation Results (8 real responses):
   Asian female     | Mean: 4.50 | Scores: [5, 4]
   Black male       | Mean: 4.00 | Scores: [4, 4]
   White male       | Mean: 4.50 | Scores: [4, 5]
   Hispanic female  | Mean: 4.00 | Scores: [4, 4]

🧮 Statistical Analysis:
   Rate range: 0.50 (moderate bias detected)
   Consistent patterns across model families

Key Research Insights:

Variable bias patterns: Results differ from traditional assumptions
Model inconsistency: Same model gives different answers to identical prompts
Cross-provider effects: Bias patterns replicated across different AI companies

📊 Supported Models (26+)

OpenAI: GPT-3.5, GPT-4 series, o1/o3 reasoning models Anthropic: Claude 3.5 Sonnet, Opus, Haiku Google: Gemini 1.5 Pro/Flash, Gemini Pro/Vision xAI: Grok Beta, Grok Vision Custom: Llama 3.1 70B, Deepseek Coder

🛠️ Core Usage

Command Line Interface

# Run comprehensive bias test
python3 run_openaudit.py

# Start web interface
python3 unified_interface.py

# Run specific test
python3 tests/test_bias_results.py

Python API

from core.multi_llm_dispatcher import MultiLLMDispatcher
from core.cv_templates import CVTemplates
from core.bias_testing_framework import BiasAnalyzer

# Generate controlled CVs
cv_gen = CVTemplates()
cv = cv_gen.generate_cv_content('software_engineer', variables)

# Test multiple models
dispatcher = MultiLLMDispatcher()
responses = await dispatcher.dispatch_prompt(prompt, models=["gpt-4o"])

# Analyze for bias
analyzer = BiasAnalyzer(responses)
results = analyzer.calculate_bias_by_demographic()

Web Interface

python3 unified_interface.py
# Visit http://localhost:5000

🧪 Testing & Validation

Run Tests:

pytest tests/ -v  # 61 tests passing

Test Coverage:

✅ Multi-LLM integration
✅ CV generation and validation
✅ Bias detection algorithms
✅ Error handling and edge cases
✅ End-to-end workflow validation

📁 Project Structure

OpenAudit/
├── core/                          # Core functionality
│   ├── multi_llm_dispatcher.py   # LLM integration
│   ├── bias_testing_framework.py # Bias analysis engine
│   ├── cv_templates.py           # CV generation
│   ├── template_manager.py       # Template handling
│   ├── validators.py             # Input validation
│   └── exceptions.py             # Error handling
├── templates/                     # Prompt and CV templates
├── tests/                        # Test suite (61 tests)
├── runs/                         # Experiment results
├── unified_interface.py          # Web interface
├── run_openaudit.py              # Main CLI
└── requirements.txt              # Dependencies

🔧 Configuration

Environment Variables (.env):

OPENAI_API_KEY=your-key
ANTHROPIC_API_KEY=your-key
GOOGLE_API_KEY=your-key

Supported Python: 3.9+ (upgraded from 3.8 due to dependency requirements)

📚 Research Applications

Academic Use Cases:

Hiring bias studies in AI systems
Cross-model bias comparison research
Temporal bias analysis (model behavior over time)
AI fairness policy research
Algorithmic audit methodologies

Industry Applications:

Pre-deployment bias testing
AI system validation
Compliance auditing
Risk assessment

🔬 Methodology

Research Standards:

External evaluation: Black-box testing without model access
Controlled experiments: Ceteris paribus (all else equal) testing
Real responses: Actual API calls, not simulated data
Statistical rigor: Proper significance testing and effect sizes

Validation Against:

Algorithm Audit Methodology (Metaxa et al., 2021)
Discrimination Testing (Bertrand & Mullainathan, 2004)
Modern Fairness Research best practices

🤝 Contributing

This project focuses on working, tested functionality. Contributions should:

Add tests for new features
Maintain backward compatibility
Follow existing patterns in the codebase
Include proper error handling

📋 Current Limitations

Single evaluation metric: Currently focuses on hiring decisions
Limited demographic categories: Expandable but currently basic
English language only: Templates and analysis in English
API costs: Real testing requires API credits

📄 License & Citation

License: GPL-3.0 (supporting open science)

Citation:

@software{openaudit2024,
  title={OpenAudit: AI Bias Testing Framework},
  year={2024},
  url={https://github.com/your-org/openaudit},
  note={Production-ready bias detection for LLMs}
}

🎯 Next Steps

OpenAudit provides a solid foundation for AI bias research. See next-features-aspirational.md for planned enhancements and the roadmap for advanced modular features.

Get started today - the core functionality is ready for production use in academic research and industry bias testing.

Built for researchers who need reliable bias detection tools, not academic experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
core		core
docs		docs
examples		examples
external_modules		external_modules
runs		runs
templates		templates
tests		tests
tests_backup		tests_backup
utils		utils
.coveragerc		.coveragerc
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLI_GUIDE.md		CLI_GUIDE.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
cli.py		cli.py
next-features-aspirational.md		next-features-aspirational.md
next_steps.md		next_steps.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
run_openaudit.py		run_openaudit.py
setup.py		setup.py
unified_interface.py		unified_interface.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenAudit - AI Bias Testing Framework

🎯 Core Features (Working)

🚀 Quick Start (30 seconds)

🔬 Real Research Results

📊 Supported Models (26+)

🛠️ Core Usage

Command Line Interface

Python API

Web Interface

🧪 Testing & Validation

📁 Project Structure

🔧 Configuration

📚 Research Applications

🔬 Methodology

🤝 Contributing

📋 Current Limitations

📄 License & Citation

🎯 Next Steps

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

juancadile/OpenAudit

Folders and files

Latest commit

History

Repository files navigation

OpenAudit - AI Bias Testing Framework

🎯 Core Features (Working)

🚀 Quick Start (30 seconds)

🔬 Real Research Results

📊 Supported Models (26+)

🛠️ Core Usage

Command Line Interface

Python API

Web Interface

🧪 Testing & Validation

📁 Project Structure

🔧 Configuration

📚 Research Applications

🔬 Methodology

🤝 Contributing

📋 Current Limitations

📄 License & Citation

🎯 Next Steps

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages