Skip to content

deboraleda/500-ai-agents-projects-landscape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

500 AI Agent Projects Landscape

This repository presents an exploratory analysis of the repository 500+ AI agent projects, with a primary focus on the presence and reliability of test or evaluations implementations.

The analysis aims to provide an overview of current testing practices in agent-based AI systems. In addition to identifying whether tests are present, the study extracts high-level characteristics such as frameworks, architectural patterns, and usage contexts.

The repository is organized into two main sections:

Frameworks: analysis of AI agent frameworks and their testing support.

Use Cases: analysis of real-world AI agent projects and how testing is applied in practice.

The goal is to support discussions in software engineering and verification & validation by highlighting gaps, trends, and opportunities for improving test practices in AI agent development.


🎯 Objectives

Primary Objective

To automatically map and classify AI agents from multiple frameworks, extracting insights related to:

  • Implementation patterns (notebooks vs. structured repositories)
  • Use of advanced techniques (RAG, multi-agent systems, workflows)
  • Technical quality indicators (presence of tests, link validity)
  • Application domains and task types

Colab with data analysis

Frameworks Use case table


🔬 Methodology

Data Extraction

Framework-specific parsers were implemented to extract structured information from the repository documentation.

parse_langgraph_agents(readme_text)
→ Extracts: Use Case | Industry | Description | Link

Test Detection Heuristics (Contextual Approach)

Test presence was inferred using lightweight heuristics, depending on the source type.

Context Evidence Searched
Notebook / Script assert, unittest, pytest, def test, @test, await aevaluate, evaluate(
Repository test/, tests/, pytest.ini, tox.ini, nose.cfg, *_test.py, test_*.py

These heuristics may generate false positives and were followed by manual verification.

Classification Heuristics

Classification relied exclusively on textual metadata (Use Case, Industry, Description).

Category Keywords
RAG rag, retrieval, document, knowledge base, vector
Multi-Agent multi-agent, hierarchical, supervisor, collaboration
Workflow graph, state, conditional (graph-based)
Intent code, chat, retrieval, data

Results — Framework-Level Analysis

Agent Distribution by Framework

Framework Total Notebook Repo Script Docs Tests Test % RAG Multi-Agent Valid Links
Autogen 61 59 0 1 1 5 8.2% 10 13 60
CrewAI 22 0 22 0 0 0 0.0% 2 0 22
LangGraph 20 0 0 20 0 0 0.0% 10 3 0
Agno 18 0 0 18 0 0 0.0% 6 0 0

Agent Intent Distribution

Intent Count
Other / Generic 55
Chat 26
Retrieval 22
Code 14
Data 4

Detected Tests — Manual Validation

Initially, five agents were flagged as potentially containing tests.

After manual inspection:

Framework Agent Assessment
Autogen Chat with OpenAI Assistant with Retrieval Augmentation ❌ No real tests (analysis script only)
Autogen Multimodal Agent Chat with DALLE and GPT-4V ❌ Assertions used as exceptions
Autogen Multimodal Agent Chat with Llava ❌ Assertions not related to testing
Autogen AgentEval ✅ Valid evaluation framework demonstrated
Autogen Optimize for Code Generation ✅ Evaluation routines present

Overall Statistics

Total agents: 121 Frameworks analyzed: 4 With tests: 5 (2 validated) Using RAG: 28 Multi-agent systems: 16 Valid links: 82 Invalid links: 39


Use Case Section — Real Projects

This section analyzes real-world AI agent projects, focusing on whether testing artifacts are present and what they actually validate.

Agent Domain Tests Evidence Created Last Commit Manual Test Analysis Notes
Product Recommendation Agent Retail tests/ folder 2023-09-07 2025-09-28 LLM-as-a-judge evaluations without testing framework
Property Pricing Agent Real Estate tests/ folder 2024-07-27 2026-01-11 Utility-level tests, minimal integration
Energy Demand Forecasting Agent Energy agent_evaluation/ 2024-07-01 2024-07-02 Performance metrics vs ground truth, not software tests
Recruitment Recommendation Agent HR test/ folder 2024-08-17 2024-09-09 Scenario execution, deterministic checks
Logistics Optimization Agent Supply Chain test/ subfolder 2023-07-31 2025-11-25 Module-level tests unrelated to agent behavior

➡️ In most cases, tests validate auxiliary code or models, not the agent’s behavior, reasoning, or decision-making.

Key Insight

The vast majority of AI agent projects do not implement meaningful tests. When tests are present, they typically target infrastructure or model performance, rather than agent-level correctness or behavioral guarantees.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages