Skip to content
Change the repository type filter

All

    Repositories list

    • Self-prediction vs cross-prediction experiment on AI psychosis red-teaming scores
      HTML
      0100Updated Apr 13, 2026Apr 13, 2026
    • Disentangling incapability from scheming in LLMs self-predicting their agentic trajectories. Measuring self-predicting capabilities for emergent misalignment fa…
      Python
      MIT License
      0000Updated Apr 3, 2026Apr 3, 2026
    • RL training of Gemma 2 2B IT for calibrated YES/NO probability estimates on BoolQ using GRPO
      Python
      0100Updated Mar 27, 2026Mar 27, 2026
    • Python
      0100Updated Mar 17, 2026Mar 17, 2026
    • Python
      0000Updated Mar 5, 2026Mar 5, 2026
    • Framework for testing LLMs' ability to predict their own behavior in multi-turn and agentic scenarios
      Python
      0000Updated Mar 3, 2026Mar 3, 2026
    • We have a list of base_prompts, e.g. "What is 2+2?". We have a prefix wrapper: "WRAPPER = 'What would you say in response to this prompt: "{p}"'. We compare the…
      Python
      0000Updated Feb 26, 2026Feb 26, 2026
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.