Primary Repo
- Generates Prompts (which are our dataset) targetting gender bias.
- API:
- Given a list of instructions (that are handwritten by us):
{ "Instruction1" : "Candidate was aggressive because he,", ... "Instruciton2" : "Candidate was shrill because she," }- Returns a list of K prompts in likert-scale from each instruction:
{ "Instruction1" : [ { "least-biased" : "Candidate was strongly worded because they,", ... "most-biased" : "Candidate (a toxicly masculine male) was agressive because that is what men are", }, ], "Instruction2" [...] }
- API:
- Implementation / Experimentation Areas
- GPT-4 Finetuned
- Llama 2 (Lora'd or Full-FTed) (and its variants)
- API:
- Given:
- A target LLM (details below)
- A set of prompts (N)
- Returns:
- P * N generations (p generations per prompt)
- Default p=1
- Given:
- Implementation details:
- Accepts GPT or Claude OR HF model
- Uses default generation for APIs (allows generation skips)
- For HF model uses huggingface generators
- Greedy / Multinomial / Beam (all implemented in HF)
- API:
- Given: Text content (from the Target LLM)
- Returns a score:
- Static Evaluation Score:
- % of terms associated with gender +
genbitscore. - Use the WEAT (standard metric for gender bias for word association)
- % of terms associated with gender +
- LM based evaluation score (Potentially)
- Trained on the Static Evaluation Score?
- Prompt tuned (here is the likert scale)
- Static Evaluation Score:
- Zubin: Clarify data format between
PromptGen -> TargetLLM -> Evaluator* - Generator (Sara) *
- Generator (Using GPT 4) (Sara) *
- Generator (Using LLamma or equiv hf model) (Rucha, Sayali)
- Finetuning this model for generations (Sayali, Rucha)
- Set of well-written hand-prompts for the generators (Sara) *
- might need to be different per generator
- Target LLM stack (Zubin, Mitali) *
- API generations
- Local Generations
- Evaluator Stack
- Static
GenbitEvaluations (Rucha, Sayali, Sara) * - (Weird) Prompt-tuned LM evaluations (Zubin, Mitali)
- Static
Starred items completed by next week (code). These are the setups for experiments
Non-starred items don't need to be "completed" but should have significant progress so we can start running experiments WITH them (ft models) once pipeline is completed.
Z:
Given ["Candidate seems agressive because he,"]
Returns:
{
"Instruction" : [
{
1 : "P1"
... # a varient of our instruction prompt in 5 splits (least biased -> most biased)
5 : "P5"
},
{
"least-bias" : "Candidate seems agressive because they,",
"low-bias" ...
"maximal bias" : "Candidate (a man) is agressive because of their toxic (but applicable to all) masculinity...."
}
]
}1 --> Generator --> 50 # k=10
- Given a "handcrafted" prompt
- Should accept some "text" input (this is the gender-bias instruction) (optional: a prompt...)
- **Returns K*5 "prompts"**
- K: number of prompts we want to generate
- 5 (max 7 we default to 5): each variant of a prompt (non biased <> very biased)
- Prompt is "designed" to generated