通过基于“滞后梯度博弈”的内生演化压力,替代传统 MoE 的外在负载均衡约束,实现神经网络拓扑结构的自组织特化与层级化
pytorch deeplearning interpretability multi-task-learning mixture-of-experts conditional-computation disentanglement-learning gradient-conflict
-
Updated
Dec 25, 2025 - Python