FSDP 与混合精度训练

Read through ultrascale playbook
Grad accumulation
FSDP2 publish
FSDP2 and fp8 mixed precision training
Pipeline parallel like dualpipe/zbv
LongContext Training long-context-llm
DeepEP alltoall comm optimization deep-ep
Distributed ckpt
Streaming dataloader
MoE
DTensor
Pathway and multi-controller/single-controller
Ray and RLHF and Hybrid Flow
TPU

FP16 和 BF16 混合精度训练

FSDP 原生支持混合精度训练

FP8

FSDP 与 FP8

1
2
3
4
5
6
7
8
9


TransformerBlock(
    (attention): Attention(
        (wq/wk/wv/wo): Float8Linear(in=4096, out=4096, bias=False) 
    )
    (feed_forward): FeedForward(
        (w1/w2/w3): Float8Linear(in=4096, out=14336, bias=False)
    )
    (attention_norm / ffn_norm): RMSNorm()
)

Author houmin

Publish January 1, 0001

LastMod January 24, 2026

License CC BY-NC-ND 4.0

Linked Mentions

No backlinks found.