Skip to content

Commit 2b44bc9

Browse files
committed
Add script for sync post
1 parent 087573f commit 2b44bc9

9 files changed

Lines changed: 815 additions & 4 deletions

hexo-site/package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
"version": "0.0.0",
44
"private": true,
55
"scripts": {
6-
"build": "hexo generate",
6+
"sync-posts": "python ../scripts/sync_root_to_hexo_posts.py",
7+
"build": "npm run sync-posts && hexo generate",
78
"clean": "hexo clean",
89
"deploy": "hexo deploy",
910
"server": "hexo server"

hexo-site/source/_posts/2026-03-31-kl-divergence.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: KL 散度简介与 PPO 中的应用
33
date: 2026-03-31
44
tags: [RL, 数学]
55
---
6+
67
# 简介
78
KL 散度(Kullback-Leibler Divergence)判断的是两个分布的“相似程度”,使用MSE并不能得到想要的结果。
89

@@ -16,18 +17,18 @@ $D_{KL}(P||Q) = \sum^N_{i=1}[p(x_i)logp(x_i)-p(x_i)logq(x_i)] $
1617

1718
那么现在,小明和小红谁预测的概率分布离真实分布比较近?这时候就可以用KL散度来衡量P1与Q的相似性、P2与Q的相似性,然后对比可得谁更相似。
1819

19-
![KL 散度示意](/imgs/kl_example.png)
20+
![KL 散度示意](imgs/kl_example.png)
2021

2122
$KL1$比$KL2$更小,说明P1与Q更相近。
2223

2324
# PPO中的应用
2425
为了防止Reward Model带来的权重修改过大,在loss函数中添加了一个约束项,也可以理解为KL散度。
2526

26-
![PPO loss 示意](/imgs/ppo_loss.png)
27+
![PPO loss 示意](imgs/ppo_loss.png)
2728

2829
这里的$\pi^{RL}_{\phi}$代表最终经过RL的模型权重概率分布,$\pi^{SFT}$代表SFT后得到的模型权重概率分布
2930

3031
如果$\gamma$等于0,则是PPO的迭代方式;如果带有$\gamma$,则会应用在预训练的时候的损失函数,防止模型过多的偏向Reward Model带来的改变。
3132

3233
# Ref
33-
https://zhuanlan.zhihu.com/p/339613080
34+
https://zhuanlan.zhihu.com/p/339613080

hexo-site/source/_posts/2026-03-31-ppo.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: PPO(Proximal Policy Optimization)学习笔记
33
date: 2026-03-31
44
tags: [RL]
55
---
6+
67
# PPO(Proximal Policy Optimization)学习笔记
78

89
## 1. On-Policy 与 Off-Policy

hexo-site/source/_posts/2026-03-31-vllm-gdn-computation.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: GDN (GatedDeltaNet) 在 vLLM 中的计算流程
33
date: 2026-03-31
44
tags: [vLLM]
55
---
6+
67
# GDN (GatedDeltaNet) 在 vLLM 中的计算流程
78

89
## 1. 整体架构概览

hexo-site/source/_posts/2026-03-31-vllm-qsa-computation.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: QSA (Query-Side Aggregation) 在 vLLM 中的计算流程
33
date: 2026-03-31
44
tags: [vLLM]
55
---
6+
67
# QSA (Query-Side Aggregation) 在 vLLM 中的计算流程
78

89
## 1. QSA 概述

hexo-site/source/_posts/2026-03-31-vllm-quantization-rotate.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: 量化中的 Rotate(旋转变换)技术及 vLLM 实现
33
date: 2026-03-31
44
tags: [vLLM, 量化]
55
---
6+
67
# 量化中的 Rotate(旋转变换)技术及 vLLM 实现
78

89
## 1. 背景:为什么需要旋转变换

hexo-site/source/_posts/2026-03-31-vllm-speculative-decoding.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: vLLM 投机解码(Eagle / MTP)实现与运行流程
33
date: 2026-03-31
44
tags: [vLLM]
55
---
6+
67
# vLLM 投机解码(Eagle / MTP)实现与运行流程
78

89
## 目录

0 commit comments

Comments
 (0)