-
Notifications
You must be signed in to change notification settings - Fork 21
Description
[Note: edited for clarification]
Dear authors,
I was trying to run ITAS algorithm for GSM8K benchmark to get a task specific ARCHON architecture. Unfortunately, I'm a bit stuck with unsupported benchmark issues.
I can see that provided scripts under benchmarks/ and benchmarks/gsm8k repos can generate and evaluate answers.
Unfortunately, it seems like itas_algorithm script in current released version supports only "mt_bench" and "arena_hard_auto":
| if self.search_config["benchmark"] in ["mt_bench", "arena_hard_auto"]: |
Please, let me know if I'm wrong and what steps are necessary to get a task specific ARCHON architecture.
My intuition leads me to the fact that I need to add question map to use in power_ranker:
Archon/src/archon/itas_algorithms/power_ranker.py
Lines 24 to 27 in d45892c
| QUESTION_MAP = { | |
| "arena_hard_auto": "archon/benchmarks/arena_hard_auto/arena_questions.jsonl", | |
| "mt_bench": "archon/benchmarksmt_bench/FastChat/fastchat/llm_judge/data/mt_bench/question.jsonl", | |
| } |
as well as add some logic to compare generated answer against a correct one. Is my intuition correct? Do you plan to update the code with this logic by any chance?
Thanks in advance!