2525< meta property ="og:url " content ="https://big-trex.github.io/index.html ">
2626< meta property ="og:site_name " content ="LLM-Assignment-Doc ">
2727< meta property ="og:locale " content ="en_US ">
28- < meta property ="article:author " content ="John Doe ">
28+ < meta property ="article:author " content ="DeepEngine ">
2929< meta name ="twitter:card " content ="summary ">
3030
3131< link rel ="canonical " href ="https://big-trex.github.io/ ">
@@ -149,7 +149,7 @@ <h1 class="site-title">LLM-Assignment-Doc</h1>
149149
150150 < span hidden itemprop ="author " itemscope itemtype ="http://schema.org/Person ">
151151 < meta itemprop ="image " content ="/LLM-Blog/images/avatar.gif ">
152- < meta itemprop ="name " content ="John Doe ">
152+ < meta itemprop ="name " content ="DeepEngine ">
153153 < meta itemprop ="description " content ="">
154154 </ span >
155155
@@ -170,7 +170,7 @@ <h2 class="post-title" itemprop="name headline">
170170 < span class ="post-meta-item-text "> Posted on</ span >
171171
172172
173- < time title ="Created: 2025-06-14 12:57:11 / Modified: 12:57:57 " itemprop ="dateCreated datePublished " datetime ="2025-06-14T12:57:11+08:00 "> 2025-06-14</ time >
173+ < time title ="Created: 2025-06-14 12:57:11 / Modified: 13:07:13 " itemprop ="dateCreated datePublished " datetime ="2025-06-14T12:57:11+08:00 "> 2025-06-14</ time >
174174 </ span >
175175
176176
@@ -192,13 +192,10 @@ <h4 id="Task-1-MalMul-with-multi-head-variant"><a href="#Task-1-MalMul-with-mult
192192< p > 朴素的矩阵乘法仅对 < code > A1</ code > 中 < code > batch_size</ code > 维度,针对每个序列索引i,都执行 < code > O1[i] = A1[i] @ W1</ code > 计算,从而得到形状为 < code > [b, s, e]</ code > 的张量 < code > O1</ code > 。</ p >
193193< p > 在多头矩阵乘法中,我们首先将输入张量 < code > A1</ code > 和权重张量 < code > W1</ code > 的 < code > h</ code > 维度均分为 < code > num_heads</ code > 个子维度(记为 < code > nh</ code > ,表示头的数量),由此得到形状为 < code > [b, s, nh, hd]</ code > 的四维张量 < code > A2</ code > 和形状为 < code > [nh, hd, e]</ code > 的三维张量 < code > W2</ code > 。接下来,对于 < code > A2</ code > 中 < code > batch_size</ code > 维度下的每个序列,遍历其 < code > num_heads</ code > 维度上的每个 < code > [s, hd]</ code > 矩阵,并将其与 W2 中 < code > num_heads</ code > 维度下对应的 < code > [hd, e]</ code > 矩阵进行乘法运算。通过多头并行计算,最终输出一个形状为 < code > [b, s, nh, e]</ code > 的四维张量 < code > O2</ code > 。</ p >
194194< h5 id ="TODO "> < a href ="#TODO " class ="headerlink " title ="TODO "> </ a > TODO</ h5 > < p > 完成 < code > matmul_with_importance</ code > 中 < strong > Task1</ strong > 的部分,实现上述多头矩阵乘法的逻辑,输入张量 < code > A1</ code > 和 < code > W1</ code > ,返回计算值 < code > O2</ code > 。</ p >
195- < blockquote >
196- < p > [!IMPORTANT]</ p >
197- < ol >
198- < li > 输入的张量是 A1 和 W1,你需要自己将其转换为 A2 和 W2 再进行计算,请注意 torch 中 < code > reshape</ code > , < code > view</ code > , < code > transpose</ code > , < code > permute</ code > 等函数的用法和区别。</ li >
199- < li > 虽然逻辑上矩阵的乘法是用遍历进行计算的,但请勿使用 for 循环的方式进行实现,请自行查阅 pytorch 的计算函数,如 < code > @</ code > , < code > torch.bmm</ code > , < code > torch.mm</ code > , < code > torch.matmul</ code > , < code > torch.einsum</ code > 等。</ li >
200- </ ol >
201- </ blockquote >
195+ < div class ="note info ">
196+ < blockquote > < p > [!IMPORTANT]</ p > < ol > < li > 输入的张量是 A1 和 W1,你需要自己将其转换为 A2 和 W2 再进行计算,请注意 torch 中 < code > reshape</ code > , < code > view</ code > , < code > transpose</ code > , < code > permute</ code > 等函数的用法和区别。</ li > < li > 虽然逻辑上矩阵的乘法是用遍历进行计算的,但请勿使用 for 循环的方式进行实现,请自行查阅 pytorch 的计算函数,如 < code > @</ code > , < code > torch.bmm</ code > , < code > torch.mm</ code > , < code > torch.matmul</ code > , < code > torch.einsum</ code > 等。</ li > </ ol > </ blockquote >
197+ </ div >
198+
202199< blockquote >
203200< p > [!NOTE]</ p >
204201< ol >
@@ -254,7 +251,7 @@ <h5 id="TODO-2"><a href="#TODO-2" class="headerlink" title="TODO"></a>TODO</h5><
254251
255252 < span hidden itemprop ="author " itemscope itemtype ="http://schema.org/Person ">
256253 < meta itemprop ="image " content ="/LLM-Blog/images/avatar.gif ">
257- < meta itemprop ="name " content ="John Doe ">
254+ < meta itemprop ="name " content ="DeepEngine ">
258255 < meta itemprop ="description " content ="">
259256 </ span >
260257
@@ -374,7 +371,7 @@ <h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerl
374371
375372 < div class ="site-overview-wrap sidebar-panel ">
376373 < div class ="site-author motion-element " itemprop ="author " itemscope itemtype ="http://schema.org/Person ">
377- < p class ="site-author-name " itemprop ="name "> John Doe </ p >
374+ < p class ="site-author-name " itemprop ="name "> DeepEngine </ p >
378375 < div class ="site-description " itemprop ="description "> </ div >
379376</ div >
380377< div class ="site-state-wrap motion-element ">
@@ -414,7 +411,7 @@ <h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerl
414411 < span class ="with-love ">
415412 < i class ="fa fa-heart "> </ i >
416413 </ span >
417- < span class ="author " itemprop ="copyrightHolder "> John Doe </ span >
414+ < span class ="author " itemprop ="copyrightHolder "> DeepEngine </ span >
418415</ div >
419416 < div class ="powered-by "> Powered by < a href ="https://hexo.io/ " class ="theme-link " rel ="noopener " target ="_blank "> Hexo</ a > & < a href ="https://muse.theme-next.org/ " class ="theme-link " rel ="noopener " target ="_blank "> NexT.Muse</ a >
420417 </ div >
0 commit comments