2025-04-24-insights

发表于 2025-04-29 更新于 2025-05-01 分类于 Arxiv-Insights 阅读次数： Valine：
本文字数： 239 阅读时长 ≈ 1 分钟

Process Reward Models That Think

前几天deepseek搞了个建模critic-list的generative-orm，今天又出来一个generative-prm，在很多reward相关的benchmark上都达到了新的sota。

话说我现在开始感觉，math/code这些有rule-based verifier的场景，好像做rm没啥用。感觉是不是可以搞一些agent、creative-writing或者instruction-following场景的对比