2828< meta property ="og:description " content ="Task 1: 均方根层归一化 (RMS Norm) 均方根层归一化(RMS Norm)是深度学习中应用最广泛的归一化模块,尤其在自然语言处理(NLP)和大语言模型(LLM)领域。该模块以形状为 [batch_size, seqlen, hidden_size] 的张量为输入(记为 X,形状为 [b, s, h]),并沿着隐藏层 h 维度,执行带可学习缩放变换的均方根归一化操作,得到输出 Y,形状 ">
2929< meta property ="og:locale " content ="zh_CN ">
3030< meta property ="article:published_time " content ="2025-06-17T06:17:05.000Z ">
31- < meta property ="article:modified_time " content ="2025-06-17T07:59:19.357Z ">
31+ < meta property ="article:modified_time " content ="2025-06-17T07:59:55.884Z ">
3232< meta property ="article:author " content ="DeepEngine ">
3333< meta property ="article:tag " content ="RMSNorm ">
3434< meta property ="article:tag " content ="Vocab Embedding ">
@@ -181,7 +181,7 @@ <h1 class="post-title" itemprop="name headline">
181181 < span class ="post-meta-item-text "> 发表于</ span >
182182
183183
184- < time title ="创建时间:2025-06-17 14:17:05 / 修改时间:15:59:19 " itemprop ="dateCreated datePublished " datetime ="2025-06-17T14:17:05+08:00 "> 2025-06-17</ time >
184+ < time title ="创建时间:2025-06-17 14:17:05 / 修改时间:15:59:55 " itemprop ="dateCreated datePublished " datetime ="2025-06-17T14:17:05+08:00 "> 2025-06-17</ time >
185185 </ span >
186186 < span class ="post-meta-item ">
187187 < span class ="post-meta-item-icon ">
@@ -213,9 +213,7 @@ <h4 id="task-1-均方根层归一化-rms-norm">Task 1: 均方根层归一化 (RM
213213< code > h</ code > 维度,执行带可学习缩放变换的均方根归一化操作,得到输出
214214< code > Y</ code > ,形状为 < code > [b, s, h]</ code > 。具体公式如下所示:</ p >
215215< p > $$ Y = $$</ p >
216- < p > < span class ="math display "> $$
217- RMS[X]=\sqrt{\frac{1}{h} \sum_{i=1}^{h}x_i^2 + \epsilon}
218- $$</ span > </ p >
216+ < p > $$ RMS[X]= $$</ p >
219217< p > 其中,< span
220218class ="math inline "> < em > R</ em > < em > M</ em > < em > S</ em > [< em > X</ em > ]</ span >
221219表示 < code > X</ code > 的均方根,对于 < code > i in batch_size</ code > 且
@@ -236,12 +234,8 @@ <h4 id="task-1-均方根层归一化-rms-norm">Task 1: 均方根层归一化 (RM
236234class ="math inline "> < em > γ</ em > </ span > 的隐藏层维度 < code > h</ code >
237235均匀划分为 < code > Xg</ code > 组,并对第 < code > i</ code > 组分别应用 < span
238236class ="math inline "> (1)(2)</ span > 式中的 < em > RMS Norm</ em >
239- 操作,具体公式如下: < span class ="math display "> $$
240- Y_{g_i}=\frac{X_{g_i}}{RMS[X_{g_i}]} \odot \gamma_{g_i}
241- $$</ span > </ p >
242- < p > < span class ="math display "> $$
243- RMS[X_{g_i}]=\sqrt{\frac{1}{gz} \sum_{j=1}^{gz}x_{g_i, j}^2 + \epsilon}
244- $$</ span > </ p >
237+ 操作,具体公式如下: $$ Y_{g_i}= _{g_i} $$</ p >
238+ < p > $$ RMS[X_{g_i}]= $$</ p >
245239< p > 此外,我们还应该为该 < em > Group RMS Norm</ em > 模块实现一个名为
246240< code > reset_parameters</ code > 的参数初始化方法,用于为可学习的参数矩阵
247241< span class ="math inline "> < em > γ</ em > </ span >
0 commit comments