Basic
基础数学原理
Transformer 模型结构
Embedding
Attention
-
Full Attention
-
Sparse Attention
MLP
MoE
VLM
Huggingface
Training
N-D 并行
- fsdp
- fsdp2
- torchtitan
- dtensor
- torch-tp
- pipeline-parallel
- ulysses-sp
- ring-attention
- context-parallel
- fsdp2-ep-tp
- deep-ep
- vocab-tp
- ib2nvlink
- device-mesh
Comm-Compute Overlap
低精度训练
负载均衡
LongContext
Dataload & Ckpt
Inference
Decode
KV-Cache
PD 分离/合并
Quant
Kernel
RL 强化学习
RL Algo
RL System
Post-Training Recipes
MaaS
Hardward
通信
Misc
Kernels
Linked Mentions
-
No backlinks found.