Hi Dan, hope all is well
Would love an addition to benchy to compare small and fast models at @2 or @3 passes.
If your model is 4x faster you can give it more passes to see if & how the accuracy improves. I'm thinking R1 1.5B @3 passes compared to 72B @1 pass
What do you think ?
Even a "Generic" 2nd or 3rd step such as:
{chat_history}
Would you change anything about your previous answer? Are there any corrections or fixes or improvements you can make? Completing the user's task accuratly and correctly is incrediblly important
Or something like that might actually improve accuracy
And I bet you can use benchy to test which is the best "generic" 2nd pass prompt, or even make a router to different 2nd or 3rd passes which are domain or context specific, and benchmark the router flows
Also with @2+ passes you can make N number of async calls to different providers simultaneously - and have the 2nd layer triggered after n/2 (or thresh) replied - and 2nd layer chooses best solution or returns hybrid/fixed final response to benchy.
... Like a MoE approach with auction/race - use super fast and cheap models , might get the job done cheaper/faster with relative or exceeding accuracy of the big models
Have never seen that being benched
Lmk what you think
All the best!