A powerful testing framework for evaluating Large Language Models (LLMs) with a modular evaluation system and real-time dashboard.
- 🧪 Jest-like syntax for writing LLM tests
- 📊 Real-time test execution dashboard
- 🔄 Multiple evaluation strategies:
- Semantic Similarity (embeddings, BLEU, ROUGE)
- LLM-based evaluation
- Rule-based validation
- Custom evaluators
- 🔗 Chain evaluators with weights
- 🎯 Configurable thresholds and parameters
- 🚀 Support for multiple LLM providers (Anthropic, OpenAI)
- 📈 Detailed test reports and analysis