You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The green agent uses LLM-as-a-judge for evaluation, which requires an OpenAI API key. You need to add this key as a GitHub secret:
20
-
21
-
1. Go to your repository's **Settings** → **Secrets and variables** → **Actions**
22
-
2. Click **New repository secret**
23
-
3. Set the following:
24
-
-**Name**: `OPENAI_API_KEY`
25
-
-**Secret**: Your OpenAI API key (starts with `sk-`)
26
-
4. Click **Add secret**
27
-
28
-
**Note:** This API key will be used by the green agent to evaluate participant submissions using LLM-as-a-judge methodology.
29
-
30
17
### Request Access to Hugging Face Dataset
31
18
32
19
This assessment requires access to the FieldWorkArena dataset hosted on Hugging Face. To request access:
@@ -43,6 +30,34 @@ This assessment requires access to the FieldWorkArena dataset hosted on Hugging
43
30
44
31
**Note:** You must have an approved access token before running the benchmark tasks. Please note that access permission handling procedures may be subject to change.
45
32
33
+
### Set up GitHub Secrets
34
+
35
+
The benchmark requires two secrets to be configured in your GitHub repository:
36
+
37
+
#### OPENAI_API_KEY
38
+
The green agent uses LLM-as-a-judge for evaluation, which requires an OpenAI API key.
39
+
40
+
1. Go to your repository's **Settings** → **Secrets and variables** → **Actions**
41
+
2. Click **New repository secret**
42
+
3. Set the following:
43
+
-**Name**: `OPENAI_API_KEY`
44
+
-**Secret**: Your OpenAI API key (starts with `sk-`)
45
+
4. Click **Add secret**
46
+
47
+
**Note:** This API key will be used by the green agent to evaluate participant submissions using LLM-as-a-judge methodology.
48
+
49
+
#### HF_TOKEN
50
+
The Hugging Face access token is required to access the FieldWorkArena dataset.
51
+
52
+
1. Go to your repository's **Settings** → **Secrets and variables** → **Actions**
53
+
2. Click **New repository secret**
54
+
3. Set the following:
55
+
-**Name**: `HF_TOKEN`
56
+
-**Secret**: Your Hugging Face access token (obtained from the dataset access request)
57
+
4. Click **Add secret**
58
+
59
+
**Note:** This token must have the necessary permissions to access the FieldWorkArena dataset on Hugging Face.
60
+
46
61
## Scoring
47
62
48
63
Participant agents are evaluated on their ability to accurately complete real-world field tasks. The evaluation uses LLM-as-a-judge methodology to assess:
@@ -66,15 +81,17 @@ Specifies the target category of tasks to run. Available options:
66
81
-`"all"`: Runs all available task categories
67
82
68
83
### token
69
-
Your FieldWorkArena-authenticated Hugging Face access token. This token is required to access the FieldWorkArena dataset. See the [Prerequisites](#request-access-to-hugging-face-dataset) section for instructions on obtaining this token.
84
+
Your FieldWorkArena-authenticated Hugging Face access token. This token is required to access the FieldWorkArena dataset.
70
85
71
-
Example configuration in `scenario.toml`:
86
+
In `scenario.toml`, configure it to use the `HF_TOKEN` environment variable from GitHub Secrets:
72
87
```toml
73
88
[config]
74
89
target = "factory"
75
-
token = "hf_xxxxxxxxxxxxxxxxxxxxx"
90
+
token = "${HF_TOKEN}"
76
91
```
77
92
93
+
**Note:** The `${HF_TOKEN}` value will be automatically replaced with your Hugging Face access token from GitHub Secrets during GitHub Actions workflow execution. See the [Prerequisites](#set-up-github-secrets) section for instructions on setting up this secret.
94
+
78
95
## Requirements for participant agent
79
96
80
97
Participant agents (Purple Agents) must meet the following requirements:
0 commit comments