0%

2024-10-25-insights

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

一篇技术报告,作者搞了一大堆数据筛法,筛出来了仅仅80k训练数据,并且发现,在这80k数据上,训出来的reward model实际上是最好的

AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

OSWorld官方团队来刷OSWorld了。作者搞了个agent 框架,把OSWorld刷到了25%,把claude打下来了。