500 AI Agent Projects Landscape

This repository presents an exploratory analysis of the repository 500+ AI agent projects, with a primary focus on the presence and reliability of test or evaluations implementations.

The analysis aims to provide an overview of current testing practices in agent-based AI systems. In addition to identifying whether tests are present, the study extracts high-level characteristics such as frameworks, architectural patterns, and usage contexts.

The repository is organized into two main sections:

Frameworks: analysis of AI agent frameworks and their testing support.

Use Cases: analysis of real-world AI agent projects and how testing is applied in practice.

The goal is to support discussions in software engineering and verification & validation by highlighting gaps, trends, and opportunities for improving test practices in AI agent development.

🎯 Objectives

Primary Objective

To automatically map and classify AI agents from multiple frameworks, extracting insights related to:

Implementation patterns (notebooks vs. structured repositories)
Use of advanced techniques (RAG, multi-agent systems, workflows)
Technical quality indicators (presence of tests, link validity)
Application domains and task types

Colab with data analysis

Frameworks Use case table

🔬 Methodology

Data Extraction

Framework-specific parsers were implemented to extract structured information from the repository documentation.

parse_langgraph_agents(readme_text)
→ Extracts: Use Case | Industry | Description | Link

Test Detection Heuristics (Contextual Approach)

Test presence was inferred using lightweight heuristics, depending on the source type.

Context	Evidence Searched
Notebook / Script	`assert`, `unittest`, `pytest`, `def test`, `@test`, `await aevaluate`, `evaluate(`
Repository	`test/`, `tests/`, `pytest.ini`, `tox.ini`, `nose.cfg`, `_test.py`, `test_.py`

These heuristics may generate false positives and were followed by manual verification.

Classification Heuristics

Classification relied exclusively on textual metadata (Use Case, Industry, Description).

Category	Keywords
RAG	rag, retrieval, document, knowledge base, vector
Multi-Agent	multi-agent, hierarchical, supervisor, collaboration
Workflow	graph, state, conditional (graph-based)
Intent	code, chat, retrieval, data

Results — Framework-Level Analysis

Agent Distribution by Framework

Framework	Total	Notebook	Repo	Script	Docs	Tests	Test %	RAG	Multi-Agent	Valid Links
Autogen	61	59	0	1	1	5	8.2%	10	13	60
CrewAI	22	0	22	0	0	0	0.0%	2	0	22
LangGraph	20	0	0	20	0	0	0.0%	10	3	0
Agno	18	0	0	18	0	0	0.0%	6	0	0

Agent Intent Distribution

Intent	Count
Other / Generic	55
Chat	26
Retrieval	22
Code	14
Data	4

Detected Tests — Manual Validation

Initially, five agents were flagged as potentially containing tests.

After manual inspection:

Framework	Agent	Assessment
Autogen	Chat with OpenAI Assistant with Retrieval Augmentation	❌ No real tests (analysis script only)
Autogen	Multimodal Agent Chat with DALLE and GPT-4V	❌ Assertions used as exceptions
Autogen	Multimodal Agent Chat with Llava	❌ Assertions not related to testing
Autogen	AgentEval	✅ Valid evaluation framework demonstrated
Autogen	Optimize for Code Generation	✅ Evaluation routines present

Overall Statistics

Total agents: 121 Frameworks analyzed: 4 With tests: 5 (2 validated) Using RAG: 28 Multi-agent systems: 16 Valid links: 82 Invalid links: 39

Use Case Section — Real Projects

This section analyzes real-world AI agent projects, focusing on whether testing artifacts are present and what they actually validate.

Agent	Domain	Tests	Evidence	Created	Last Commit	Manual Test Analysis
Product Recommendation Agent	Retail	✅	`tests/` folder	2023-09-07	2025-09-28	LLM-as-a-judge evaluations without testing framework
Property Pricing Agent	Real Estate	✅	`tests/` folder	2024-07-27	2026-01-11	Utility-level tests, minimal integration
Energy Demand Forecasting Agent	Energy	✅	`agent_evaluation/`	2024-07-01	2024-07-02	Performance metrics vs ground truth, not software tests
Recruitment Recommendation Agent	HR	✅	`test/` folder	2024-08-17	2024-09-09	Scenario execution, deterministic checks
Logistics Optimization Agent	Supply Chain	✅	`test/` subfolder	2023-07-31	2025-11-25	Module-level tests unrelated to agent behavior

➡️ In most cases, tests validate auxiliary code or models, not the agent’s behavior, reasoning, or decision-making.

Key Insight

The vast majority of AI agent projects do not implement meaningful tests. When tests are present, they typically target infrastructure or model performance, rather than agent-level correctness or behavioral guarantees.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
frameworks_table_analyzer.py		frameworks_table_analyzer.py
use_case_table_analyzer.py		use_case_table_analyzer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

500 AI Agent Projects Landscape

🎯 Objectives

Primary Objective

Colab with data analysis

🔬 Methodology

Data Extraction

Test Detection Heuristics (Contextual Approach)

Classification Heuristics

Results — Framework-Level Analysis

Agent Intent Distribution

Detected Tests — Manual Validation

Overall Statistics

Use Case Section — Real Projects

Key Insight

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

500 AI Agent Projects Landscape

🎯 Objectives

Primary Objective

Colab with data analysis

🔬 Methodology

Data Extraction

Test Detection Heuristics (Contextual Approach)

Classification Heuristics

Results — Framework-Level Analysis

Agent Intent Distribution

Detected Tests — Manual Validation

Overall Statistics

Use Case Section — Real Projects

Key Insight

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages