Skip to content

Commit a64c646

Browse files
committed
Site updated: 2025-06-14 13:41:55
1 parent 01588a9 commit a64c646

File tree

9 files changed

+816
-44
lines changed

9 files changed

+816
-44
lines changed

2025/06/14/A1-matmul/index.html

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,16 +20,18 @@
2020
var CONFIG = {"hostname":"big-trex.github.io","root":"/LLM-Blog/","scheme":"Pisces","version":"7.8.0","exturl":false,"sidebar":{"position":"left","display":"post","padding":18,"offset":12,"onmobile":false},"copycode":{"enable":false,"show_result":false,"style":null},"back2top":{"enable":true,"sidebar":false,"scrollpercent":false},"bookmark":{"enable":false,"color":"#222","save":"auto"},"fancybox":false,"mediumzoom":false,"lazyload":false,"pangu":false,"comments":{"style":"tabs","active":null,"storage":true,"lazyload":false,"nav":null},"algolia":{"hits":{"per_page":10},"labels":{"input_placeholder":"Search for Posts","hits_empty":"We didn't find any results for the search: ${query}","hits_stats":"${hits} results found in ${time} ms"}},"localsearch":{"enable":false,"trigger":"auto","top_n_per_article":1,"unescape":false,"preload":false},"motion":{"enable":true,"async":false,"transition":{"post_block":"fadeIn","post_header":"slideDownIn","post_body":"slideDownIn","coll_header":"slideLeftIn","sidebar":"slideUpIn"}}};
2121
</script>
2222

23-
<meta name="description" content="Task 1: MalMul with multi-head variant在 task 1 中,我们要实现两个矩阵相乘的逻辑,我们有以下两个矩阵: A1:一个 3D 的输入张量,形状为 [batch_size, seq_len, hidden_size],batch_size 表示序列的数量,seqlen 表示一个序列的最大长度,hidden_size 表示序列中每一个 token 拥有的维度">
23+
<meta name="description" content="Assignment for A1">
2424
<meta property="og:type" content="article">
2525
<meta property="og:title" content="A1 matmul">
2626
<meta property="og:url" content="https://big-trex.github.io/2025/06/14/A1-matmul/index.html">
2727
<meta property="og:site_name" content="LLM-Assignment-Doc">
28-
<meta property="og:description" content="Task 1: MalMul with multi-head variant在 task 1 中,我们要实现两个矩阵相乘的逻辑,我们有以下两个矩阵: A1:一个 3D 的输入张量,形状为 [batch_size, seq_len, hidden_size],batch_size 表示序列的数量,seqlen 表示一个序列的最大长度,hidden_size 表示序列中每一个 token 拥有的维度">
28+
<meta property="og:description" content="Assignment for A1">
2929
<meta property="og:locale" content="zh_CN">
3030
<meta property="article:published_time" content="2025-06-14T04:57:11.000Z">
31-
<meta property="article:modified_time" content="2025-06-14T05:13:23.517Z">
31+
<meta property="article:modified_time" content="2025-06-14T05:41:44.652Z">
3232
<meta property="article:author" content="DeepEngine">
33+
<meta property="article:tag" content="Mutmal">
34+
<meta property="article:tag" content="Multi-head">
3335
<meta name="twitter:card" content="summary">
3436

3537
<link rel="canonical" href="https://big-trex.github.io/2025/06/14/A1-matmul/">
@@ -116,6 +118,16 @@ <h1 class="site-title">LLM-Assignment-Doc</h1>
116118

117119
<a href="/LLM-Blog/" rel="section"><i class="fa fa-home fa-fw"></i>首页</a>
118120

121+
</li>
122+
<li class="menu-item menu-item-about">
123+
124+
<a href="/LLM-Blog/about/" rel="section"><i class="fa fa-user fa-fw"></i>关于</a>
125+
126+
</li>
127+
<li class="menu-item menu-item-categories">
128+
129+
<a href="/LLM-Blog/categories/" rel="section"><i class="fa fa-th fa-fw"></i>分类</a>
130+
119131
</li>
120132
<li class="menu-item menu-item-archives">
121133

@@ -174,10 +186,11 @@ <h1 class="post-title" itemprop="name headline">
174186
<span class="post-meta-item-text">发表于</span>
175187

176188

177-
<time title="创建时间:2025-06-14 12:57:11 / 修改时间:13:13:23" itemprop="dateCreated datePublished" datetime="2025-06-14T12:57:11+08:00">2025-06-14</time>
189+
<time title="创建时间:2025-06-14 12:57:11 / 修改时间:13:41:44" itemprop="dateCreated datePublished" datetime="2025-06-14T12:57:11+08:00">2025-06-14</time>
178190
</span>
179191

180192

193+
<div class="post-description">Assignment for A1</div>
181194

182195
</div>
183196
</header>
@@ -233,6 +246,10 @@ <h5 id="TODO-2"><a href="#TODO-2" class="headerlink" title="TODO"></a>TODO</h5><
233246

234247

235248
<footer class="post-footer">
249+
<div class="post-tags">
250+
<a href="/LLM-Blog/tags/Mutmal/" rel="tag"># Mutmal</a>
251+
<a href="/LLM-Blog/tags/Multi-head/" rel="tag"># Multi-head</a>
252+
</div>
236253

237254

238255

@@ -320,6 +337,10 @@ <h5 id="TODO-2"><a href="#TODO-2" class="headerlink" title="TODO"></a>TODO</h5><
320337
<span class="site-state-item-name">日志</span>
321338
</a>
322339
</div>
340+
<div class="site-state-item site-state-tags">
341+
<span class="site-state-item-count">2</span>
342+
<span class="site-state-item-name">标签</span>
343+
</div>
323344
</nav>
324345
</div>
325346

2025/06/14/hello-world/index.html

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,16 @@ <h1 class="site-title">LLM-Assignment-Doc</h1>
116116

117117
<a href="/LLM-Blog/" rel="section"><i class="fa fa-home fa-fw"></i>首页</a>
118118

119+
</li>
120+
<li class="menu-item menu-item-about">
121+
122+
<a href="/LLM-Blog/about/" rel="section"><i class="fa fa-user fa-fw"></i>关于</a>
123+
124+
</li>
125+
<li class="menu-item menu-item-categories">
126+
127+
<a href="/LLM-Blog/categories/" rel="section"><i class="fa fa-th fa-fw"></i>分类</a>
128+
119129
</li>
120130
<li class="menu-item menu-item-archives">
121131

@@ -295,6 +305,10 @@ <h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerl
295305
<span class="site-state-item-name">日志</span>
296306
</a>
297307
</div>
308+
<div class="site-state-item site-state-tags">
309+
<span class="site-state-item-count">2</span>
310+
<span class="site-state-item-name">标签</span>
311+
</div>
298312
</nav>
299313
</div>
300314

archives/2025/06/index.html

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,16 @@ <h1 class="site-title">LLM-Assignment-Doc</h1>
112112

113113
<a href="/LLM-Blog/" rel="section"><i class="fa fa-home fa-fw"></i>首页</a>
114114

115+
</li>
116+
<li class="menu-item menu-item-about">
117+
118+
<a href="/LLM-Blog/about/" rel="section"><i class="fa fa-user fa-fw"></i>关于</a>
119+
120+
</li>
121+
<li class="menu-item menu-item-categories">
122+
123+
<a href="/LLM-Blog/categories/" rel="section"><i class="fa fa-th fa-fw"></i>分类</a>
124+
115125
</li>
116126
<li class="menu-item menu-item-archives">
117127

@@ -272,6 +282,10 @@ <h1 class="site-title">LLM-Assignment-Doc</h1>
272282
<span class="site-state-item-name">日志</span>
273283
</a>
274284
</div>
285+
<div class="site-state-item site-state-tags">
286+
<span class="site-state-item-count">2</span>
287+
<span class="site-state-item-name">标签</span>
288+
</div>
275289
</nav>
276290
</div>
277291

archives/2025/index.html

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,16 @@ <h1 class="site-title">LLM-Assignment-Doc</h1>
112112

113113
<a href="/LLM-Blog/" rel="section"><i class="fa fa-home fa-fw"></i>首页</a>
114114

115+
</li>
116+
<li class="menu-item menu-item-about">
117+
118+
<a href="/LLM-Blog/about/" rel="section"><i class="fa fa-user fa-fw"></i>关于</a>
119+
120+
</li>
121+
<li class="menu-item menu-item-categories">
122+
123+
<a href="/LLM-Blog/categories/" rel="section"><i class="fa fa-th fa-fw"></i>分类</a>
124+
115125
</li>
116126
<li class="menu-item menu-item-archives">
117127

@@ -272,6 +282,10 @@ <h1 class="site-title">LLM-Assignment-Doc</h1>
272282
<span class="site-state-item-name">日志</span>
273283
</a>
274284
</div>
285+
<div class="site-state-item site-state-tags">
286+
<span class="site-state-item-count">2</span>
287+
<span class="site-state-item-name">标签</span>
288+
</div>
275289
</nav>
276290
</div>
277291

archives/index.html

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,16 @@ <h1 class="site-title">LLM-Assignment-Doc</h1>
112112

113113
<a href="/LLM-Blog/" rel="section"><i class="fa fa-home fa-fw"></i>首页</a>
114114

115+
</li>
116+
<li class="menu-item menu-item-about">
117+
118+
<a href="/LLM-Blog/about/" rel="section"><i class="fa fa-user fa-fw"></i>关于</a>
119+
120+
</li>
121+
<li class="menu-item menu-item-categories">
122+
123+
<a href="/LLM-Blog/categories/" rel="section"><i class="fa fa-th fa-fw"></i>分类</a>
124+
115125
</li>
116126
<li class="menu-item menu-item-archives">
117127

@@ -272,6 +282,10 @@ <h1 class="site-title">LLM-Assignment-Doc</h1>
272282
<span class="site-state-item-name">日志</span>
273283
</a>
274284
</div>
285+
<div class="site-state-item site-state-tags">
286+
<span class="site-state-item-count">2</span>
287+
<span class="site-state-item-name">标签</span>
288+
</div>
275289
</nav>
276290
</div>
277291

css/main.css

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1168,7 +1168,7 @@ pre .javascript .function {
11681168
}
11691169
.links-of-author a::before,
11701170
.links-of-author span.exturl::before {
1171-
background: #ff57d7;
1171+
background: #c78312;
11721172
border-radius: 50%;
11731173
content: ' ';
11741174
display: inline-block;

index.html

Lines changed: 24 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,16 @@ <h1 class="site-title">LLM-Assignment-Doc</h1>
112112

113113
<a href="/LLM-Blog/" rel="section"><i class="fa fa-home fa-fw"></i>首页</a>
114114

115+
</li>
116+
<li class="menu-item menu-item-about">
117+
118+
<a href="/LLM-Blog/about/" rel="section"><i class="fa fa-user fa-fw"></i>关于</a>
119+
120+
</li>
121+
<li class="menu-item menu-item-categories">
122+
123+
<a href="/LLM-Blog/categories/" rel="section"><i class="fa fa-th fa-fw"></i>分类</a>
124+
115125
</li>
116126
<li class="menu-item menu-item-archives">
117127

@@ -170,7 +180,7 @@ <h2 class="post-title" itemprop="name headline">
170180
<span class="post-meta-item-text">发表于</span>
171181

172182

173-
<time title="创建时间:2025-06-14 12:57:11 / 修改时间:13:13:23" itemprop="dateCreated datePublished" datetime="2025-06-14T12:57:11+08:00">2025-06-14</time>
183+
<time title="创建时间:2025-06-14 12:57:11 / 修改时间:13:41:44" itemprop="dateCreated datePublished" datetime="2025-06-14T12:57:11+08:00">2025-06-14</time>
174184
</span>
175185

176186

@@ -184,44 +194,15 @@ <h2 class="post-title" itemprop="name headline">
184194
<div class="post-body" itemprop="articleBody">
185195

186196

187-
<h4 id="Task-1-MalMul-with-multi-head-variant"><a href="#Task-1-MalMul-with-multi-head-variant" class="headerlink" title="Task 1: MalMul with multi-head variant"></a>Task 1: MalMul with multi-head variant</h4><p>在 task 1 中,我们要实现两个矩阵相乘的逻辑,我们有以下两个矩阵:</p>
188-
<ul>
189-
<li><code>A1</code>:一个 3D 的输入张量,形状为 <code>[batch_size, seq_len, hidden_size]</code><code>batch_size</code> 表示序列的数量,<code>seqlen</code> 表示一个序列的最大长度,<code>hidden_size</code> 表示序列中每一个 <code>token</code> 拥有的维度。我们简写 <code>A1</code> 的形状为 <code>[b, s, h]</code></li>
190-
<li><code>W1</code>:一个 2D 的权重张量,形状为 <code>[hidden_size, embed_size]</code>,它表示一个投影矩阵,将任何行向量从 <code>hidden_size</code>-dim 投影到 <code>embed_size</code>-dim。我们简写 <code>W1</code> 的形状为 <code>[h, e]</code></li>
191-
</ul>
192-
<p>朴素的矩阵乘法仅对 <code>A1</code><code>batch_size</code> 维度,针对每个序列索引i,都执行 <code>O1[i] = A1[i] @ W1</code> 计算,从而得到形状为 <code>[b, s, e]</code> 的张量 <code>O1</code></p>
193-
<p>在多头矩阵乘法中,我们首先将输入张量 <code>A1</code> 和权重张量 <code>W1</code><code>h</code> 维度均分为 <code>num_heads</code> 个子维度(记为 <code>nh</code>,表示头的数量),由此得到形状为 <code>[b, s, nh, hd]</code> 的四维张量 <code>A2</code> 和形状为 <code>[nh, hd, e]</code> 的三维张量 <code>W2</code>。接下来,对于 <code>A2</code><code>batch_size</code> 维度下的每个序列,遍历其 <code>num_heads</code> 维度上的每个 <code>[s, hd]</code> 矩阵,并将其与 W2 中 <code>num_heads</code> 维度下对应的 <code>[hd, e]</code> 矩阵进行乘法运算。通过多头并行计算,最终输出一个形状为 <code>[b, s, nh, e]</code> 的四维张量 <code>O2</code></p>
194-
<h5 id="TODO"><a href="#TODO" class="headerlink" title="TODO"></a>TODO</h5><p>完成 <code>matmul_with_importance</code><strong>Task1</strong> 的部分,实现上述多头矩阵乘法的逻辑,输入张量 <code>A1</code><code>W1</code>,返回计算值 <code>O2</code></p>
195-
<div class="note info">
196-
<ol><li>输入的张量是 A1 和 W1,你需要自己将其转换为 A2 和 W2 再进行计算,请注意 torch 中 <code>reshape</code>, <code>view</code>, <code>transpose</code>, <code>permute</code>等函数的用法和区别。</li><li>虽然逻辑上矩阵的乘法是用遍历进行计算的,但请勿使用 for 循环的方式进行实现,请自行查阅 pytorch 的计算函数,如 <code>@</code>, <code>torch.bmm</code> , <code>torch.mm</code> , <code>torch.matmul</code> , <code>torch.einsum</code> 等。</li></ol>
197-
</div>
198-
199-
<div class="note warning">
200-
<ol><li>所有输入张量均在同一设备(CPU 或 CUDA)上从标准正态分布 N (0, 1) 随机初始化,具有相同的数据类型(float32、float16 或 bfloat16),并且在所有测试用例中均未设置 <code>require_grad</code></li><li>在所有测试用例中,<code>hidden_size</code> 均会被保证能被 <code>num_heads</code> 整除。</li></ol>
201-
</div>
202-
203-
<h4 id="Task-2-MalMul-with-importance"><a href="#Task-2-MalMul-with-importance" class="headerlink" title="Task 2: MalMul with importance"></a>Task 2: MalMul with importance</h4><p>在多头矩阵乘法的基础上,我们引入一个表示“重要性”的概率张量 <code>P</code>,其形状为 <code>[b, s]</code>。P 中的每个元素表示 <code>A1</code> 中对应位置的元素的重要程度。基于这个重要性概率,我们的目标是只对每个序列中的 “重要” 元素执行矩阵乘法运算。这些重要元素总共有<code>total_important_seq_len</code> 个,简记为 <code>t</code>,其计算结果会被收集到输出张量 <code>O3</code> 中,其形状为 <code>[t, nh, e]</code></p>
204-
<p>为了精确界定 “重要” 元素的范围,我们提供两个可选参数:</p>
205-
<ol>
206-
<li><code>top_p</code>:取值范围为 <code>[0., 1.]</code> 的浮点数。只有概率值大于或等于 <code>top_p</code> 的元素才被视为 “重要” 元素,默认值为 <code>1.0</code></li>
207-
<li><code>top_k</code>:取值范围为 <code>[1, ..., seq_len]</code> 的整数。对于批次中的每个序列,只将概率最高的 <code>top_k</code> 个元素视为 “重要” 元素。如果未设置 <code>top_k</code>(默认值为 <code>None</code>),则等价于 <code>top_k = seq_len</code></li>
208-
</ol>
209-
<p>注意,必须同时满足上述两点的元素才是重要元素。</p>
210-
<h5 id="TODO-1"><a href="#TODO-1" class="headerlink" title="TODO"></a>TODO</h5><p>完成 <code>matmul_with_importance</code><strong>Task2</strong> 的部分,实现上述重要性乘法。首先,你需要根据 <code>top_p</code><code>top_k</code> 的值,从 <code>A1</code> 中挑选出“重要”的元素,组成 <code>[t, h]</code> 的张量 <code>A3</code>,再仿造 <strong>Task1</strong> 中的多头矩阵乘法,输出 <code>[t, nh, e]</code> 的张量 <code>O3</code></p>
211-
<div class="note info">
212-
<p>可以使用 <code>torch.topk</code> 计算 <code>topk</code> 个重要元素。</p>
213-
</div>
214-
215-
<div class="note warning">
216-
<p>在所有测试用例中,<code>top_p</code><code>top_k</code> 参数均会被保证在各自有效范围内取值。</p>
217-
</div>
218-
219-
<h4 id="Task-3-MalMul-with-grad"><a href="#Task-3-MalMul-with-grad" class="headerlink" title="Task 3: MalMul with grad"></a>Task 3: MalMul with grad</h4><p>此外,如果提供了输出张量的可选梯度(记为 <code>dO3</code>,其形状与 <code>O3</code> 相同),我们还需要计算输入张量的梯度(记为 <code>dA1</code>,形状与 <code>A1</code> 相同)和权重张量的梯度(记为 <code>dW1</code>,形状与 <code>W1</code> 相同)。若未提供 <code>dO3</code>,则 <code>dA1</code><code>dW1</code> 均返回 <code>None</code></p>
220-
<h5 id="TODO-2"><a href="#TODO-2" class="headerlink" title="TODO"></a>TODO</h5><p>完成 <code>matmul_with_importance</code><strong>Task3</strong> 的部分,请参考 <strong>A0</strong> 中介绍的两种求梯度的方式,返回 <code>A1</code><code>W1</code> 的梯度。</p>
221-
<div class="note info">
222-
<ol><li>若未提供 <code>grad_output</code> 参数,应避免计算梯度以提高效率并节省内存。</li><li>若提供了 <code>grad_output</code> 参数,可使用 PyTorch 的自动求导机制计算梯度,但需注意潜在的副作用,这些副作用可能会在测试中被测试。</li></ol>
223-
</div>
224-
197+
<p>Assignment for A1</p>
198+
<!--noindex-->
199+
<div class="post-button">
200+
<a class="btn" href="/LLM-Blog/2025/06/14/A1-matmul/">
201+
阅读全文 &raquo;
202+
</a>
203+
</div>
204+
<!--/noindex-->
205+
225206

226207
</div>
227208

@@ -376,6 +357,10 @@ <h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerl
376357
<span class="site-state-item-name">日志</span>
377358
</a>
378359
</div>
360+
<div class="site-state-item site-state-tags">
361+
<span class="site-state-item-count">2</span>
362+
<span class="site-state-item-name">标签</span>
363+
</div>
379364
</nav>
380365
</div>
381366

0 commit comments

Comments
 (0)