RL Scaling: Agent Infra
主流场景
RL from human Feedback
RL with verifiable rewards
RL with multi-turn agentic interaction
DeepSpeed-Chat
OpenRLHF
FlexRLHF
Linked Mentions
-
No backlinks found.