Question: How to run itas algorithm for each benchmark besides mt_bench and arena? For example gsm8k?

[Note: edited for clarification]

Dear authors,

I was trying to run ITAS algorithm for `GSM8K` benchmark to get a task specific ARCHON architecture. Unfortunately, I'm a bit stuck with `unsupported benchmark` issues. 

I can see that provided scripts under `benchmarks/` and `benchmarks/gsm8k` repos can generate and evaluate answers. 
Unfortunately, it seems like `itas_algorithm` script in current released version supports only "mt_bench" and "arena_hard_auto": https://github.com/ScalingIntelligence/Archon/blob/d45892c417e09165f239ec71769e314d46f3d28a/src/archon/itas_algorithms/itas_algorithm.py#L150 

Please, let me know if I'm wrong and what steps are necessary to get a task specific ARCHON architecture.

My intuition leads me to the fact that I need to add question map to use in `power_ranker`: https://github.com/ScalingIntelligence/Archon/blob/d45892c417e09165f239ec71769e314d46f3d28a/src/archon/itas_algorithms/power_ranker.py#L24-L27 

as well as add some logic to compare generated answer against a correct one. Is my intuition correct? Do you plan to update the code with this logic by any chance?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: How to run itas algorithm for each benchmark besides mt_bench and arena? For example gsm8k? #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	QUESTION_MAP = {
	"arena_hard_auto": "archon/benchmarks/arena_hard_auto/arena_questions.jsonl",
	"mt_bench": "archon/benchmarksmt_bench/FastChat/fastchat/llm_judge/data/mt_bench/question.jsonl",
	}

Question: How to run itas algorithm for each benchmark besides mt_bench and arena? For example gsm8k? #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions