Prompt Buckets

Primary Repo

November 1st Meeting Task Breakup:

Infra Engineering:

Prompt Generator Stack

Generates Prompts (which are our dataset) targetting gender bias.

API:

Given a list of instructions (that are handwritten by us):

{
    "Instruction1" : "Candidate was aggressive because he,",
    ...
    "Instruciton2" : "Candidate was shrill because she,"
}

Returns a list of K prompts in likert-scale from each instruction:

{
    "Instruction1" : [
        {
            "least-biased" : "Candidate was strongly worded because they,",
            ...
            "most-biased" : "Candidate (a toxicly masculine male) was agressive because that is what men are",
        },
    ],
    "Instruction2" [...]
}

Implementation / Experimentation Areas
- GPT-4 Finetuned
- Llama 2 (Lora'd or Full-FTed) (and its variants)

Target LLM Stack

API:
- Given:
  - A target LLM (details below)
  - A set of prompts (N)
- Returns:
  - P * N generations (p generations per prompt)
  - Default p=1
Implementation details:
- Accepts GPT or Claude OR HF model
- Uses default generation for APIs (allows generation skips)
- For HF model uses huggingface generators
  - Greedy / Multinomial / Beam (all implemented in HF)

Evaluator Stack

API:
- Given: Text content (from the Target LLM)
- Returns a score:
  - Static Evaluation Score:
    - % of terms associated with gender + genbit score.
    - Use the WEAT (standard metric for gender bias for word association)
  - LM based evaluation score (Potentially)
    - Trained on the Static Evaluation Score?
    - Prompt tuned (here is the likert scale)

TODO (Nov 1):

Starred items completed by next week (code). These are the setups for experiments

Non-starred items don't need to be "completed" but should have significant progress so we can start running experiments WITH them (ft models) once pipeline is completed.

Graveyard:

Z: Given ["Candidate seems agressive because he,"]

Returns:

{
"Instruction" :   [
        {
            1 : "P1"
            ... # a varient of our instruction prompt in 5 splits (least biased -> most biased)
            5 : "P5"
        },

        {
            "least-bias" : "Candidate seems agressive because they,",
            "low-bias" ...

            "maximal bias" : "Candidate (a man) is agressive because of their toxic (but applicable to all) masculinity...."
        }

    ]
}

1 --> Generator --> 50 # k=10

- Given a "handcrafted" prompt


- Should accept some "text" input (this is the gender-bias instruction) (optional: a prompt...)
- **Returns K*5 "prompts"**
  - K: number of prompts we want to generate
  - 5 (max 7 we default to 5): each variant of a prompt (non biased <> very biased)
  - Prompt is "designed" to generated

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
README.md		README.md
Untitled.ipynb		Untitled.ipynb
evaluator.py		evaluator.py
genbit_evaluation.ipynb		genbit_evaluation.ipynb
llama_genbit.jsonl		llama_genbit.jsonl
llama_honest.jsonl		llama_honest.jsonl
llama_regard.jsonl		llama_regard.jsonl
mistral_genbit.jsonl		mistral_genbit.jsonl
mistral_honest.jsonl		mistral_honest.jsonl
mistral_regard.jsonl		mistral_regard.jsonl
openai_genbit.jsonl		openai_genbit.jsonl
openai_honest.jsonl		openai_honest.jsonl
openai_regard.jsonl		openai_regard.jsonl
out.json		out.json
requirements.txt		requirements.txt
targetLM.py		targetLM.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt Buckets

November 1st Meeting Task Breakup:

Infra Engineering:

Prompt Generator Stack

Target LLM Stack

Evaluator Stack

TODO (Nov 1):

Graveyard:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prompt Buckets

November 1st Meeting Task Breakup:

Infra Engineering:

Prompt Generator Stack

Target LLM Stack

Evaluator Stack

TODO (Nov 1):

Graveyard:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages