- LIMA : Less Is More for Alignment
- Textbooks are all you need
- Textbooks are all you need
- GPT3:Language Models are Few-Shot Learners
- Textbooks are all you need
- GPT Self-Supervision for a Better Data Annotator
- Lets Verify Step by Step
- Training language models to follow instructions with human feedback
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Scaling Laws for Reward Model Overoptimization
- AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
- RRHF: Rank Responses to Align Language Models with Human Feedback without tears
- Training language models to follow instructions with human feedback
- LIMA : Less Is More for Alignment
- The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
- Scaling Instruction-Finetuned Language Models
- Aligning Language Models with Self-Generated Instructions
- SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions
- Wizardlm: Empowering large language models to follow complex instructions
- WizardCoder: Empowering Code Large Language Models with Evol-Instruct
- Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation
- Alpaca: A Strong Replicable Instruction-Following Model
- How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
- Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
