FSDP 与混合精度训练
- Read through ultrascale playbook
- Grad accumulation
- FSDP2 publish
- FSDP2 and fp8 mixed precision training
- Pipeline parallel like dualpipe/zbv
- LongContext Training long-context-llm
- DeepEP alltoall comm optimization deep-ep
- Distributed ckpt
- Streaming dataloader
- MoE
- DTensor
- Pathway and multi-controller/single-controller
- Ray and RLHF and Hybrid Flow
- TPU
FP16 和 BF16 混合精度训练
FSDP 原生支持混合精度训练
FP8
FSDP 与 FP8
|
|
Linked Mentions
-
No backlinks found.