RL Scaling: Off-Policy vs On Policy

参考

https://fengyao.notion.site/off-policy-rl
https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference
https://zhuanlan.zhihu.com/p/1938184527752176241
https://arxiv.org/pdf/2506.13585#page=8 Minimax
https://bytedance.larkoffice.com/docx/ZeP0d3pIdoHVwOx0DpIcrCPInRc
https://fengyao.notion.site/flash-rl

Author houmin

Publish January 1, 0001

LastMod November 9, 2025

License CC BY-NC-ND 4.0

Linked Mentions

No backlinks found.

RL Scaling: Agent Infra RL Scaling: 分离还是 Colocate

Table of Contents

参考

© 2022 – 2026 Powered by Hugo & Cosmos