RL Scaling: Off-Policy vs On Policy
参考
- https://fengyao.notion.site/off-policy-rl
- https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference
- https://zhuanlan.zhihu.com/p/1938184527752176241
- https://arxiv.org/pdf/2506.13585#page=8 Minimax
- https://bytedance.larkoffice.com/docx/ZeP0d3pIdoHVwOx0DpIcrCPInRc
- https://fengyao.notion.site/flash-rl
Linked Mentions
-
No backlinks found.