Reverse-engineering hidden backdoor triggers in three 671B DeepSeek-V3 language models. Activation-space probing, SVD weight analysis, and absorbed MLA SVD for the Jane Street Dormant LLM Puzzle.
puzzle svd jane-street ml-security mechanistic-interpretability deepseek-v3 sleeper-agents activation-probing llm-backdoor
-
Updated
Apr 2, 2026 - Python