veRL 强化学习课 Algo ppo RM reward-model Infra hybrid-engine partial-rollout stream-rollout rl-math rl-colocate rl-async Application search-r1 deep-researcher retool dapo deep-eyes agent-r1 Author houmin Publish January 1, 0001 LastMod November 9, 2025 License CC BY-NC-ND 4.0 Linked Mentions No backlinks found.