Skip to content

Commit 8965074

Browse files
committed
Site updated: 2025-07-15 14:28:50
1 parent 6176948 commit 8965074

File tree

27 files changed

+1928
-49
lines changed

27 files changed

+1928
-49
lines changed

2025/06/03/hello-world/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -292,7 +292,7 @@ <h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerl
292292
<div class="site-state-item site-state-posts">
293293
<a href="/LLM-Blog/archives/">
294294

295-
<span class="site-state-item-count">5</span>
295+
<span class="site-state-item-count">6</span>
296296
<span class="site-state-item-name">日志</span>
297297
</a>
298298
</div>
@@ -303,7 +303,7 @@ <h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerl
303303
<span class="site-state-item-name">分类</span></a>
304304
</div>
305305
<div class="site-state-item site-state-tags">
306-
<span class="site-state-item-count">7</span>
306+
<span class="site-state-item-count">9</span>
307307
<span class="site-state-item-name">标签</span>
308308
</div>
309309
</nav>

2025/06/10/A0-onboarding/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -525,7 +525,7 @@ <h1 id="Contact"><a href="#Contact" class="headerlink" title="Contact"></a>Conta
525525
<div class="site-state-item site-state-posts">
526526
<a href="/LLM-Blog/archives/">
527527

528-
<span class="site-state-item-count">5</span>
528+
<span class="site-state-item-count">6</span>
529529
<span class="site-state-item-name">日志</span>
530530
</a>
531531
</div>
@@ -536,7 +536,7 @@ <h1 id="Contact"><a href="#Contact" class="headerlink" title="Contact"></a>Conta
536536
<span class="site-state-item-name">分类</span></a>
537537
</div>
538538
<div class="site-state-item site-state-tags">
539-
<span class="site-state-item-count">7</span>
539+
<span class="site-state-item-count">9</span>
540540
<span class="site-state-item-name">标签</span>
541541
</div>
542542
</nav>

2025/06/14/A1-matmul/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ <h5 id="TODO-2"><a href="#TODO-2" class="headerlink" title="TODO"></a>TODO</h5><
335335
<div class="site-state-item site-state-posts">
336336
<a href="/LLM-Blog/archives/">
337337

338-
<span class="site-state-item-count">5</span>
338+
<span class="site-state-item-count">6</span>
339339
<span class="site-state-item-name">日志</span>
340340
</a>
341341
</div>
@@ -346,7 +346,7 @@ <h5 id="TODO-2"><a href="#TODO-2" class="headerlink" title="TODO"></a>TODO</h5><
346346
<span class="site-state-item-name">分类</span></a>
347347
</div>
348348
<div class="site-state-item site-state-tags">
349-
<span class="site-state-item-count">7</span>
349+
<span class="site-state-item-count">9</span>
350350
<span class="site-state-item-name">标签</span>
351351
</div>
352352
</nav>

2025/06/17/A2-norm-emb/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -527,7 +527,7 @@ <h5 id="TODO-3"><a href="#TODO-3" class="headerlink" title="TODO"></a>TODO</h5><
527527
<div class="site-state-item site-state-posts">
528528
<a href="/LLM-Blog/archives/">
529529

530-
<span class="site-state-item-count">5</span>
530+
<span class="site-state-item-count">6</span>
531531
<span class="site-state-item-name">日志</span>
532532
</a>
533533
</div>
@@ -538,7 +538,7 @@ <h5 id="TODO-3"><a href="#TODO-3" class="headerlink" title="TODO"></a>TODO</h5><
538538
<span class="site-state-item-name">分类</span></a>
539539
</div>
540540
<div class="site-state-item site-state-tags">
541-
<span class="site-state-item-count">7</span>
541+
<span class="site-state-item-count">9</span>
542542
<span class="site-state-item-name">标签</span>
543543
</div>
544544
</nav>

2025/06/29/A3-modeling-mlp/index.html

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
<meta property="og:description" content="对于本次作业,我们将继续 Modeling 任务,以帮助你更深入地理解 Transformer 的各个组成模块。本次将特别关注 Transformer 结构核心的关键层之一:MLP 层。 Task 1: Dense MLPMulti-Layer Perceptron (MLP) 模块是深度学习中的一个基本模块,特别适用于处理复杂模式和非线性关系的任务。它已被广泛应用于基于 Transformer">
2929
<meta property="og:locale" content="zh_CN">
3030
<meta property="article:published_time" content="2025-06-29T13:54:26.000Z">
31-
<meta property="article:modified_time" content="2025-06-30T09:37:16.053Z">
31+
<meta property="article:modified_time" content="2025-06-30T09:39:08.458Z">
3232
<meta property="article:author" content="DeepEngine">
3333
<meta property="article:tag" content="MLP">
3434
<meta property="article:tag" content="LoRA">
@@ -187,7 +187,7 @@ <h1 class="post-title" itemprop="name headline">
187187
<i class="far fa-calendar-check"></i>
188188
</span>
189189
<span class="post-meta-item-text">更新于</span>
190-
<time title="修改时间:2025-06-30 17:37:16" itemprop="dateModified" datetime="2025-06-30T17:37:16+08:00">2025-06-30</time>
190+
<time title="修改时间:2025-06-30 17:39:08" itemprop="dateModified" datetime="2025-06-30T17:39:08+08:00">2025-06-30</time>
191191
</span>
192192
<span class="post-meta-item">
193193
<span class="post-meta-item-icon">
@@ -273,7 +273,7 @@ <h1 id="Dense-MLP-小结"><a href="#Dense-MLP-小结" class="headerlink" title="
273273
<li>接收输入 X,执行 GLU-style MLP with LoRA adapters 的 forward 计算过程,其中激活函数由 <code>activation_type</code> 指定;</li>
274274
<li>最后输出的 <code>hidden_states</code>,记为 O 应与 X 具有相同形状。</li>
275275
</ol>
276-
<h1 id="Task-3-Sparse-MLP"><a href="#Task-3-Sparse-MLP" class="headerlink" title="Task 3: Sparse MLP"></a>Task 3: Sparse MLP</h1><p>在 Task 1,2 中实现的 DenseMLPWithLoRA 模块的基础上,我们将继续结合主流 Mixture-of-Experts (MoE) 架构实现 SparseMLPWithLoRA 模块(更多细节见参考文献)。首先,所谓 <strong>Dense</strong> 的 MLP 模块,通常是指一种标准结构:它先将 <code>hidden_states</code> $\mathbf{X}$ 从 <code>h</code> 维上投影(up-project)到更高的 <code>ffh</code> 维,再通过 <code>gating</code> 机制下投影(down-project)回原始维度。</p>
276+
<h1 id="Task-3-Sparse-MLP"><a href="#Task-3-Sparse-MLP" class="headerlink" title="Task 3: Sparse MLP"></a>Task 3: Sparse MLP</h1><p>在 Task 1,2 中实现的 <code>DenseMLPWithLoRA</code> 模块的基础上,我们将继续结合主流 Mixture-of-Experts (MoE) 架构实现 <code>SparseMLPWithLoRA</code> 模块(更多细节见参考文献)。首先,所谓 <strong>Dense</strong> 的 MLP 模块,通常是指一种标准结构:它先将 <code>hidden_states</code> $\mathbf{X}$ 从 <code>h</code> 维上投影(up-project)到更高的 <code>ffh</code> 维,再通过 <code>gating</code> 机制下投影(down-project)回原始维度。</p>
277277
<p>对于 <strong>Sparse</strong> 的 MLP 模块,类似于 attention 模块中的 multi-head 机制,将投影矩阵的 <code>ffh</code> 维度划分为 <code>ne</code> 个大小相等的 <code>shard</code>(分片),每个 <code>shard</code> 的大小为 <code>e = ffh // ne</code>,对应一个“专家”(expert)$E_i$,其中 $i \in [0, …, \text{ne}−1]$(<code>ne</code> 表示专家的数量)。因此,与传统使用大维度 <code>ffh</code> 的 Dense 投影不同,<code>SparseMLPWithLoRA</code> 模块中,<code>hidden_states</code> $\mathbf{X}$ 中的每个 <code>token</code> 仅通过一个 <strong>routing mechanism</strong> 映射到 <code>k</code> 个特定的 experts,每个 expert 仅负责处理一个特定的 <code>e</code> 维子空间(在本模块中,你可以简单地<strong>将每个 expert 建模为一个小型的 DenseMLPWithLoRA 模块</strong>,其中 <code>ffh_size</code> 参数设置为 <code>e</code>)。最终,每个 <code>token</code> 的最终输出是来自这 <code>k</code> 个 experts 子输出的加权和。通过这种方式,我们可以同时实现两个目标:</p>
278278
<ul>
279279
<li>降低高维计算开销;</li>
@@ -343,7 +343,10 @@ <h2 id="TODO-补充参考文献"><a href="#TODO-补充参考文献" class="heade
343343
<a href="/LLM-Blog/2025/06/17/A2-norm-emb/" rel="prev" title="A2 RMSNorm and Embedding">
344344
<i class="fa fa-chevron-left"></i> A2 RMSNorm and Embedding
345345
</a></div>
346-
<div class="post-nav-item"></div>
346+
<div class="post-nav-item">
347+
<a href="/LLM-Blog/2025/07/13/A4-attention-module/" rel="next" title="A4 Attention Module">
348+
A4 Attention Module <i class="fa fa-chevron-right"></i>
349+
</a></div>
347350
</div>
348351
</footer>
349352

@@ -416,7 +419,7 @@ <h2 id="TODO-补充参考文献"><a href="#TODO-补充参考文献" class="heade
416419
<div class="site-state-item site-state-posts">
417420
<a href="/LLM-Blog/archives/">
418421

419-
<span class="site-state-item-count">5</span>
422+
<span class="site-state-item-count">6</span>
420423
<span class="site-state-item-name">日志</span>
421424
</a>
422425
</div>
@@ -427,7 +430,7 @@ <h2 id="TODO-补充参考文献"><a href="#TODO-补充参考文献" class="heade
427430
<span class="site-state-item-name">分类</span></a>
428431
</div>
429432
<div class="site-state-item site-state-tags">
430-
<span class="site-state-item-count">7</span>
433+
<span class="site-state-item-count">9</span>
431434
<span class="site-state-item-name">标签</span>
432435
</div>
433436
</nav>

2025/07/13/A4-attention-module/bottom-right.svg

Lines changed: 4 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)