Context
According to this paper ChatGPT (and likely other LLMs) suffer from a recency bias. Whatever class comes last has a higher propability of being selected.
Issue
Currently scikit-llm constructs prompts based on the order of the training data.
Since we are recommended to restrict the training data I would usually do something like this:
df = df.groupby(label_col).apply(lambda x: x.sample(n_samples))
df = df.reset_index(drop=True)
Which returns a sorted dataframe by label_col. Even if sort=False is passed to groupby the instances are still clustered by label.
Question/Solution
Should a method be implemented that randomizes the order of samples in the prompt / training data, or should users take care of that themselves?
The most straightforward way would be to simply add this to sampling:
Which leaves it up to chance to balance it reasonably.