Skip to content

Add Mind2Web evaluation task with two candidate pool sizes#25

Open
nuglifeleoji wants to merge 1 commit intoace-agent:mainfrom
nuglifeleoji:pr/mind2web
Open

Add Mind2Web evaluation task with two candidate pool sizes#25
nuglifeleoji wants to merge 1 commit intoace-agent:mainfrom
nuglifeleoji:pr/mind2web

Conversation

@nuglifeleoji
Copy link

Add Mind2Web web navigation task for ACE framework:

  • mind2web: ~200 candidate elements per step (199 negative + positives)
  • mind2web2: ~50 candidate elements per step (49 negative + positives)

Each task includes:

  • prepare_data.py: Downloads Mind2Web from HuggingFace, converts to step-level ACE samples with candidate element selection formulation, performs stratified train/val/test split by domain
  • data_processor.py: Three-level evaluation (element index + operation type + value matching) with flexible parsing
  • run.py: Standard ACE training/evaluation script with offline, online, and eval_only modes
  • data/sample_config.json: Data path configuration

The two versions enable studying the effect of candidate pool size on ACE's context learning performance for web agent tasks.

Add Mind2Web web navigation task for ACE framework:
- mind2web: ~200 candidate elements per step (199 negative + positives)
- mind2web2: ~50 candidate elements per step (49 negative + positives)

Each task includes:
- prepare_data.py: Downloads Mind2Web from HuggingFace, converts to
  step-level ACE samples with candidate element selection formulation,
  performs stratified train/val/test split by domain
- data_processor.py: Three-level evaluation (element index + operation
  type + value matching) with flexible parsing
- run.py: Standard ACE training/evaluation script with offline, online,
  and eval_only modes
- data/sample_config.json: Data path configuration

The two versions enable studying the effect of candidate pool size on
ACE's context learning performance for web agent tasks.
@Alex-q-z Alex-q-z self-requested a review February 13, 2026 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant