State of GPT 2

讲完结构之后，大概会讲讲怎么训练的，会介绍 DeepSpeed，Megatron 的基本原来，从而引出各种并行原理，Tensor/PipeLine/ZeRO/3 D Parallel

参考资料：

https://huggingface.co/blog/zh/bloom-megatron-deepspeed

这个是微软的介绍 ZeRO 和 DeepSpeed 的文章

https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/

关于各种并行原理的解读：

DeepSpeed

Megatron-LM

GPipe

ZeRO

ZeRO

https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/

Colossal

Ray

OpenAI kubernetes 集群 7500 节点 Ray Cluster stack

OpenAI 使用 ray

https://thenewstack.io/how-ray-a-distributed-ai-framework-helps-power-chatgpt/

Author Houmin Wei

Publish July 6, 2023

LastMod July 6, 2023

License CC BY-NC-ND 4.0

Linked Mentions

No backlinks found.