This is the first prototype for an "end-to-end" cycle of agent arena.
This includes: (1) data curation, (2) hacky API to call data, (3) fully independent "agent" doing the job, (4) "submitting" the job, and (5) grading via LLM. This demo is meant to show the full flow of Agent Arena, and should not be taken as any work actually done on the implementation of Agent Arena. To be clear: THIS DEMO IS ONLY MEANT TO BE A COMMUNICATION TOOL.
- Python 3.12.9
- pip (Python package installer)
- Create a virtual environment:
python3.12 -m venv venv- Activate the virtual environment:
- On macOS/Linux:
source venv/bin/activate- On Windows:
.\venv\Scripts\activate- Install required packages:
pip install -r requirements.txt- To deactivate the virtual environment when you're done:
deactivateNote: Make sure to activate the virtual environment before running any Python scripts or notebooks in this project.
Create directories data and output. Move df_randomized.csv into data as that is a requirement of everything else.